AD-A12&  546  A  PRELIMINARY  ANALYSIS  OF  HUMAN  FACTORS  AFFECTING  THE 
RECOGNITION  ACCURAC . . (U)  NAVAL  POSTGRADUATE  SCHOOL 
MONTEREY  CA  H  W  YELLEN  MAR  83 


UNCLASSIFIED 


F/G  17/2 


OTIC  fiif  nopy  AD  a  12  8546 


NAVAL  POSTGRADUATE  SCHOOL 

Monterey,  California 


''mm 


THESIS 


~~  t  !  v 
'  f 


A  PRELIMINARY  ANALYSIS  OF  HUMAN  FACTORS 

AFFECTING  THE  RECOGNITION  ACCURACY  OF  A 

DISCRETE  WORD  RECOGNIZER  FOR  C3  SYSTEMS 

by 

Howard  William  Yellen 

March  1983 

Thesis  Advisor: 

G.  K.  Poock 

Approved  for  public  release;  distribution  unlimited 


83  05  25  044 


SECURITY  CLASSIFICATION  or  THIS  P  AOt  fWMP  0««  hun« 


REPORT  DOCUMENTATION  PAGE 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


RECIPIENT'S  CATALOG  NUMBER 


4.  title  fair  s.  tyre  or  report  a  period  covered 

A  Preliminary  Analysis  of  Human  Factors  Affectinc  Master.s  Thesis;  March  1983 
the  Recognition  Accuracy  of  a  Discrete  Word 

Recognizer  for  C3  Systems  t.  performing oro.  report numeer 


AUTMONfr) 

Howard  William  Yellen 


PERFORMINO  ORGANIZATION  NAME  ANO  ADDRESS 


Naval  Postgraduate  School 
Monterey,  California  93940 


12.  REPORT  DATE 

March  1983 


II.  CONTROLLING  OrriCE  NAME  ANO  ADDRESS 

Naval  Postgraduate  School 

Monterey,  California  93940  i  ij.^numrer or  pages 


NAM  I  4  AOORESV**  <Hlfotont  from  Controlling  OWco)  IS.  SECURITY  CLASS,  (of  thit  roport) 

Inclasslfled 


IS*.  DECLASSIFICATION/  OOWNGRAOl NO 
SCHEDULE 


4.  DISTRIBUTION  STATEMENT  (•(  <41  •  Roport) 

Approved  for  public  release;  distribution  unlimited. 


IT.  DISTRIBUTION  STATEMENT  (ml  tko  okotr—t  mtoeoP  In  Rtoo k  20,  II  dlHotont  from  R rpoti) 


It*  KEY  WO  NOS  fCHIwi  m  r*9WM  NM  II  n—ooory  «N  IMnlUr  by  block  mmkm) 

Voice  Recognition 
Human  Factors 

Automatic  Speech  Recognition 
Statistical  Significance 


M.  ABSTRACT  (Contbmu  om  room—  mdo  if  no— off  ftfanillp  by  41m*  mmkm) 

Literature  pertaining  to  Voice  Recognition  abounds  with  Information 
relevant  to  the  assessment  of  transitory  speech  recognition  devices. 

In  the  past,  engineering  requirements  have  dictated  the  path  this 
technology  followed.  But,  other  factors  do  exist  that  influence 
recognition  accuracy.  This  thesis  explores  the  Impact  of  Human  Factors 
on  the  successful  recognition  of  speech,  principally  addressing  the 
differences  or  variability  among  users.  A  Threshold  Technology  T-600 


,  j  mt*  Mr*  edition  or  i  nov  it  ooeolkte  j 

S/N  0102' LF-0)4>  4401  '  — -  - - — 


SECURITY  CLASSIFICATION  OF  THIS  PACE 


MCUMTY  ClAMIFICAT(OM  ' 


IS  (MM*  M<  laHnO 


was  used  for  a  100  utterance  vocabulary  to  test  44  subjects.  A 
statistical  analysis  was  conducted  on  5  generic  categories  of  Human 
Factors:  Occupational,  Operational,  Psychological,  Physiological 
and  Personal.  How  the  equipment  is  trained  and  the  experience  level 
of  the  speaker  were  found  to  be  key  characteristics  Influencing 
recognition  accuracy.  To  a  lesser  extent  computer  experience,  time 
of  week,  accent,  vital  capacity  and  rate  of  air  flow,  speaker 
cooperativeness  and  anxiety  were  found  to  affect  overall  error  rate. 


o  C  \ 


S  N  0102-  LF-  0 1 4-  660 1 


mcurity  cl  awific axiom  of  thi»  faokhMw  j 


JL.1- 


Approved  for  public  release*  distribution  unlimited. 


A  Preliminary 
fieccgni lion 


Analysis  of  human  Factors  Affecting  The 
Accuracy  of  a  Discrete  Word  Becogmzer 
For  Ci5  Systems 


by 


Howard  William  Yeiien 
Captain,  United  States  Army 
£ . A . ,  Temple  University,  iy7* 


Submitted  in  partial  fulfillment  of  the 
requirements  for  the  degree  of 


MASTER  SCIENCE  IN  SYSTEMS  TECHNCIOGY 
^ COMMAND  ,  CONTROL,  AM  CCMMUNICAT ICNS  / 


from  tne 

NAVAI  POSTGRADUATE  SChCOI 
March  1 b£U 


ABSTRACT 


\ 

\ 

\ 

\\ 

^  Literature  pertaining  tc  Voice  Recognition  aDounds  with 
information  relevant  to  the  assessment  of  transitory  speech 
recognition  devices.  In  the  past,  engineering  requirements 
have  dictated  the  path  this  technology  followed.  But,  other 
factors  do  exist  that  influence  recognition  accuracy.  This 
thesis  explores  the  impact  of  Hvrran  lectors  on  the 
successful  recognition  of  speech,  principally  addressing  the 
differences  or  variability  among  users.  A  Threshold 
Technology  T-C00  was  used  for  a  100  utterance  vocabulary  to 
test  44  subjects.  A  statistical  analysis  was  conducted  on  t> 
generic  categories  of  Human  Factors:  Occupational, 

Operational,  Fsychclcgicai ,  Fhysioicgicai  and  Personal.  Hew 
the  equipment  is  trained  ana  the  experience  level  of  the 
speaker  were  found  tc  be  Jcey  characterlst  ics  influencing 
recognition  accuracy.  To  a  lesser  extent  computer 
experience,  time  of  week,  accent,  vital  capacity  and  rate  of 
air  flew,  speaker  cocperativeness  and  anxiety  were  found  to 
affect  overall  error  rates. 

T 

* 

\ 


4 


TABU  01  CONTENTS 

INTRODUCTION . 

CCMFU1EB  RECOGNITION  CE  SPEECH . 

A.  OVERVIEW  OE  VOICE  INPUT  TECHNOLOGY _ 

P.  THE  VaLUE  OF  SPEECH  RECOGNITION . 

1.  Advantages  of  Speech  Recognition.. 

2.  Limitations  of  Speech  Hecognition. 

C.  APPLICABILITY  Oi  COMPUTER  RECOGNITION 
OF  SPEECH . 

1.  Commercial  Applications . 

2.  Military  Applications . 

HUMAN  FACTORS  IN  SPEECH  RECOGNITION . 

A.  DEFINITION  AND  PURPOSE . 

B.  FACTORS  AFFECTING  RECOGNITION  ACCURACY 

1.  General . 

2.  Differences  Between  Speakers . 

6.  Differences  Within  Speakers . 

4.  Msceiianeous  Factors . 

DESCRIPTION  OF  THE  EXPERIMENT . 

A.  OBJECTIVES  AND  CONSTRAINTS . 

1.  Objectives . 

a.  Occupational  Characteristics., 
c.  Operational  Characteristics... 
c.  Personal  Characteristics . 


KP****’!*-"-' 


a.  Physiological  Characteristics .  Z6 

e.  Psychological  Characteristics .  b7 

2.  Constraints . .  b8 

£.  SUBJECTS .  by 

C.  EQUIPMENT .  60 


1.  Voice  Recognition  Systerr, 


2.  Spirorreter, 


3.  Peaic  Plow  Meter 


4.  Tape  Recorder .  72 

D .  INSTRUMENTATION .  73 

1.  User  Questionnaire  #1 .  74 

2.  User  Questionnaire  #2 .  74 

3.  STAI  Questionnaire .  7b 

Z.  EXPERIMNTaL  DESIGN .  76 

s.  PROCEDURE .  76 


1.  Training 


2.  Recognition  Testing, 


3.  Vocabulary .  80 

G  .  VAfil  ABIES . 615 

ANAIYSIS  AND  RESULTS .  62 

A.  GENERAL .  62 

B.  OCCUPATIONAL  CHARACTERISTICS .  64 

1.  Hypotheses .  84 


2.  job  Eunctlon, 


3.  Eranch  of  Service, 


4.  Job  ana  Serice  Sa tlsraction. 


5.  Frevicus  Computer  Experience .  91 

6.  foreign  Language  Competency..... .  92 

C.  OPERATIONAL  CHARACTERISTICS .  94 

1.  hypotheses .  y4 

2.  fethoa  or  Training .  95 

5.  Time  of  Ley  ana  tfeeK .  97 

4.  User  Ziperience .  98 

t.  Ease  of  Use . 121 

E .  PERSONAL  C  HAi?AO  i  ip  I  Si  I CS . 122 

1 .  hypotheses  . . 122 

Z .  Face  . . . 124 

6.  i^rital  Status  ana  family  Size . 124 

* .  Religious  Preference . 126 

l .  Accent . 12? 

c.  Place  or  Firth  ana  Geographic  Origin . 128 

7.  level  cf  Zauueiicn . 112 

c.  Soci c-econon  ic  Class . 112 

St .  Eentai . 112 

£.  PhYSICLCuICAL  CEAKACIERI  STICS . 114 

1.  hypotheses . 114 

2.  Age . 115 

2.  Height  ana  height . 116 

4.  Vital  Capacity  ana  Rate  of  Air  flow . 118 

5.  Physical  Condition . 122 

7 


I 


J.  PSYCtOLOUlCAL  CfaARACTEHlSTICS 


114 

1.  Hypotoeses . 124 

2.  Psychological  Anxiety . 124 

2.  Speaker  cooperativeness . liy 

4.  Recognition  Errors . 121 

5.  Attitudes  Towara  T ne  Use  of  Voice. . 122 

6.  Attitude  Toward  Computers  ana  Information 

Processing . 126 

G.  YCCA1ULARY  ERRORS . 128 

VI.  CONCLUSIONS . 141 

APPENDIX  A:  USER  QUEST IONNAIRE  #1... . 147 

APPENDIX  i  :  USER  QUESTIONNAIRE  «2 . 156 

APPENDIX  C:  SIIP-IVALUAT ION  QUESTIONNAIRE . 161 

APPENDIX  D :  SEI J-EVaLUA1  ION  CUES TICNNA IRE . 164 

APPENDIX  Z:  UTTERANCE  LIST:  TRAINING  fcSEK  -  ¥EEK#1 . 157 

APPENDIX  E  :  UTTERANCE  LIST:  WEEK#! . 170 

APPENDIX  G  :  UTTERANCE  LIST:  WEZX*2 . 172 

APPENDIX  ti:  DATA  COLLECT  ION  POP.f'i . 176 

APPENDIX  I:  FASTER  LIST  CE  UTTERANCES . 181 

APPENDIX  J:  INDIVIDUAL  SUDJZCT  RECOGNITION  RATES . 184 

LIS1  OP  REFERENCES . 166 

INITIAL  IISTRI  PUTICN  LIST . lay 


6 


-  w 


LIST  OF  FIGURES 


1.  Speech  Recognition  Model .  20 

2.  Processing  Functions  of  a  Speecn  Recognition 

System .  23 

3.  T-600  Speech  Recognition  Equipment .  63 

4.  Acoustic  Scvna  Reduction  Ccamcer .  64 

5.  Piacerrent  cf  the  SKURE  SM-10  Microphone .  65 

6.  Recording  Spirometer .  66 

7.  Use  cf  Recording  Spirometer  to  Measure  and  Record 

Vital  Capacity .  6y 

6.  The  Virignt  PeaK  Flew  Meter .  71 

9.  Measurement  of  Speaker's  Rate  of  Air  Flow .  71 

10.  A&AI  Tape  Recorder .  72 

11.  Experimental  lesigr. .  77 

'.'i .  Mean  Error  Rate  vs.  Jot  Function .  85 

13.  ke?n  Error  Rate  vs.  Branch  of  Service .  £9 

14.  Mean  Error  Rate  vs.  Computer  Experience .  93 

15.  Mean  Error  Rate  vs.  Training  Method .  96 

16.  Trials  versus  Jet  Function . 100 

IV.  Trials  versus  Training  Method . 100 

16.  Mean  Error  Rate  versus  Accent . 106 

19.  Mean  Error  Rate  vs.  Education . 112 

20.  Mean  Error  Rate  vs.  Vital  Capacity . 120 

21.  Mean  Error  Rate  vs.  Rate  of  Air  Flow . 120 


9 


22.  Scatter  Plot  tor  Tiitai  Capacity . 

23.  Scatter  Piet  fer  Rate  or  Air  Eicv . 

24.  Kean  Error  Kate  vs.  State  Anxiety  (Week  #1 ) 


26.  Mean  Error  Rate  vs.  State  Anxiety  (Week  #2)  . 

26.  Mean  Error  Rate  vs.  Irait  Anxiety  . 

27.  Mean  Error  Rate  vs.  Speaker  Cocperat iveness . 

26.  Scatter  Plot:  Mean  Error  Rate  *s.  Question  # 4 . 

2s*.  Scatter  Fict:  Mean  Error  Rate  vs.  Question  #6  . 

cl.  Scatter  Plot:  Mean  Error  Rate  vs.  Question  #6  .  . 

Cl.  Mean  Error  Rate  vs.  #  Syllables  (by  Week) . 

22.  Mean  Error  Rate  vs.  #  Syllables  (Overall) . 


li«J 


LIST  Oi  TABLES 


I. 

II. 

III. 

IV. 

V  . 

VI. 

VII. 

■Jill. 

n. 

A. 

A.  I  . 


XII  . 
XIII  . 
1 1  V  . 

XV. 
aV  I  . 
XVII. 
XV  III  . 
XIX. 

XX  . 
XXI  . 


MILITARY  APPLICATIONS  ICE  SPEECH  RECOGNITION . . . .  25 

EIMENSIONS  Cl  DIFFICULTY  ICR  SPEECH 

RECOGNITION .  42 


SUBJECT  CHARACTERISTICS 


61 


TEST  x OR  EQUALITY  CP  VARIANCES . 

ANALYSIS  C?  VARIANCE  FOE  RECOGNITION  ACCURACY. 
MAN  TOTAL  EP.P.CR  RATES  FOR  JC3  FUNCTION 

BY  X  ZEES . 

AFFECT  EY  BRANCH  CF  SERVICE . 

AIF'ECT  EY  JOB  /SEE  V  ICE  SATISFACTION . 

AFFECT  CF  COMPUTER  EXPERIENCE . 

AFFECT  CF  COMPETENCY  IN  ANOTHER  LANGUAGE . 

MEAN  TOTAL  ERROR  RATES  FOE  MThCE  CF  TRAINING 
rY  '*  i Z 5 S . 

affect  cf  time  cf  bay  a;.e  xeek. . 

AFFECT  CUE  TC  C SIB  EXPERIENCE . . . 

aEFEC'1  BUI  IC  EASE  CF  USE  CF  *.  CTCE  ECU TPMlNT .  . 

AFFECT  CF  RACE  ON  RECOGNITION  ACCURACY . 

AFFECT  CF  MARITAL  STATUS  A N E  EAV!LY  SIZE . 

AFFECT  CF  RELIC  ICUS  PEIFIFF'-CZ . 

AFFECT  CF  ACCENT  ON  RECOGNITION  ACCURACY . 

AFFECT  OF  PLACE  CF  BIRTH  ANT  GECGRAPH IC  ORIGIN 


.  02 
.  86 

.  8? 
.  88 
.  90 
.  92 
.  94 

.  96 
.  98 
.  y9 
.102 
.  104 


.  1C6 

.  107 
.  1 09 


AFFECT  CF  LEV.-.L  OF  IBUCAIICN . Ill 

AFFECT  CF  SCC  IO-ECCNCMI  C  CLASS . 112 


XXII. 
XXI II  . 
XXIV. 

XXV  . 
XXVI. 

XXVII  . 
XXVIII. 

XXIX. 
XXX . 

XXXI. 


AFFECT  CE  PAST  AND/OR  PRESENT  DENTAL  GASP . 114 

AFiECT  CN  RECOGNITION  ACCURACY  DUE  TO  AGE . 116 

AFFECT  OF  HEIGHT  AND  WEIGHT  CN  RECCGNITICN 
ACCURACY . 117 


AFFECT  CP  VITAL  CAPACITY  AND  RATE  CP  AIR  FLOW... 119 
AEEECT  CN  RECCGNITICN  ACCURACY  DUE  TC 

PHYSICAL  CONDITION . 103 

AFFECT  CN  RECOGNITION  ACCURACY  DUE  TC  ANXIETY ... 128 


AEiECT  CE  SPEAKER  COOPERATION  AMD 

PARTICIPATION . 12? 

AFFECT  OF  RECOGN  ITICN  ERRORS . 122 

AEIECT  DUE  TO  ATTITUDES  PERTAINING  TC  TFE 

USE  CE  VOICE . .102 

AFFECT  DUE  TO  ATTITUDES  TOWARD  COMPUTERS 

AND  DATA  PROCESSING . lO"7 


12 


k 


ACKNOWLEIGEMENTS 


I  wish  to  express  try  thanks  to  rry  thesis  advisor, 
Professor  Gary  Pcock  for  introducing  tre  to  the  world  of 
voice  technology,  allowing  me  the  independence  to  conduct 
the  experimentation  as  I  desired,  and  for  the  competitive 
challenge  posed  on  the  racquetball  court?  to  CEP  Chick 
Hutchins  for  his  expertise  and  advice  in  Human  Factors  ana 
for  serving  as  second  reader?  to  Jay  Partin  and  Ellen 
Roland  for  their  practical  advice?  and  to  Paul  Sparks  for 
his  technical  assistance  and  advice. 

Finally,  ry  slrcerest  thanks  to  my  wife,  Susan  for  her 
help,  understandirg  and  en ccurageTent ?  and  to  ny  scr, 
Plchael,  who  has  spent  the  better  part  of  three  months 
wondering  where  Lad  was,  for  his  special  smile  and  big  hue 
when  it  was  needed  the  most. 


13 


•/  ju-'i  :■ 


I.  IMROrUCTICN 


The  insistence  and  dependence  upon  state  cf  the  art 
equipment  has  been  a  predominant  characte ri s t ic  throughout 
tbe  efforts  within  the  Command  and  Control  community. 
Respite  the  penchant  for  never,  better,  and  more 
sophisticated  equipment,  there  must  exist  some  measure  cf 
emphasis  cr  the  personnel  needed  to  train  with,  operate  cn, 
and  maintain  the  readiness  of,  such  equipment.  Personnel 
considerations  cannot  be  divorced  from  test  programs 
designed  to  identify  optimal  systems  or  equipment.  When 
these  considerations  are  carefully  examined,  then  the  data 
ottained  from  such  programs  can  be  effectively  used  to 
enhance  personnel  subsystem  design  and  Implementation. 

A  personnel  subsystem  test  program  is  one  which  places 
the  requisite  emphasis  on  personnel  rather  than  equipment. 
Kryter  [Ref.  l]  enumerates  six  objectives  necessary  for  a 
successful  test  program. 

1.  To  evaluate  whether  the  system  can  be  operated, 
maintained  and  controlled  by  the  personnel  assigned  to 
it . 

2.  To  determine  the  effect  of  human  performance  on  system 
performance  and  vice  versa.  This  objective  is  ai^ed 
at  discovering  critical  inadequacies  in  man-mschine 
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interaction  and  subsequently  identify  changes  that 
would  Improve  their  compatibility. 

2.  To  develop  valid  qualitative  end  quantitative 

personnel  requirements,  selection  procedures,  ard 
tables  of  organizational  manning.  Row  rrany  and  what 
type  of  people  will  provide  optimal  effectiveness  cf 
the  man-nachine  interface? 

4.  To  evaluate  individual  and/or  long  term  operational 
readiness  and  applicable  training  programs. 

5.  To  evaluate  training  equipment  and  supporting 
ma  teria 1 s  . 

e.  To  evaluate  job  aids,  technical  publications  and  other 
tools  for  training  and  for  assisting  cn  the  Job 
performance . 

Increased  productivity  through  automation  involves  two 
major  issues*  technological  and  human.  Speech  is  a  uniquely 
human  capability.  Speech  recognition  ty  a  computer  involves 
getting  a  machine  to  accept,  recognize,  and  correct ly 
respond  to  spoken  messages.  This  machine  must  take  the 
input  speech,  compere  it  against  the  expected  pronunciation 
for  allowable  utterances,  identify  the  intended  message  or 
utterance,  and  produce  the  correct  and  appropriate  response. 
To  adequately  implement  the  capabilities  of  such  a 
technology,  the  objectives  above  become  all  the  more 
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relevant.  Cf  paramount  Importance  is  tbe  human,  for  It 
takes  people  to  make  all  this  automation  work. 

Speech  recognizers  commercially  available  today  are 
effective  only  within  narrow  limits.  They  have  relatively 
small  vocabularies  and  'frequently'  confuse  words.  Within 
this  context,  it  becomes  incumbent  upon  the  user  to  develop 
the  skill  to  talk  to  the  recognizer  [Be?.  2:  p.  re]  .  As 
such,  a  recognizer's  performance  will  vary  widely  from 
speaker  to  speaker. 

much  of  the  work  In  speech  recognition  has  centered  cr 
the  development  and  improvement  of  speech  recognition 
devices.  For  example: 

—  linear  Predictive  Coding  (IPC)  in  early  '7 ?*5 

—  Eynamic  programming 

—  Development  of  1  million  tit/sec  processors 

A  user's  experience  notwithstanding,  the  human  variable  in 
recognition  performance  remains  strong.  This  has  often  been 
observed  In  the  past  and  ever,  led  tc  a  description  cf  user 
categories  [Ref.  2:  p.  20]  of  'sheeps'  and  'goats'.  These 
speech  recognition  systems  work  well  for  the  'sheep'  cut  the 
majority  of  the  problems  are  created  by  a  small  segment  of 
the  population  -  the  'goats'. 

Recognizing  the  significant  impact  that  engineers  have 
had  on  perpetuating  the  continued  advent  and  technological 
advancement  of  speech  recognition,  it  is  nevertheless, 


critical  to  remind  ourselves  of  the  Interdisciplinary  nature 
of  speech  recognition.  Besides  engineering,  the  total 
discipline  cf  speech  sciences  and  technology  includes  such 
traditional  disciplines  as  psychology,  linguistics,  anatomy 
and  physiology,  computer  sciences  and  human  factors.  This 
thesis  endeavors  to  examine  the  impact  of  human  factors  on 
the  successful  recognition  of  speech,  principally  addressing 
the  differences  or  variability  among  users. 

First,  the  modality  of  voice  input  will  te  examined 
citing  some  of  the  more  readily  apparent  advantages  and 
disadvantages ,  and  an  overview  provided  as  to  its  potential 
applicability  in  a  Command  and  Control  environment.  With  a 
general  appreciation  cf  speech  recognition  (the  term  'voice 
recognition'  is  syroncmous  and  used  interchangeably  within 
this  document)  in  hand,  the  variety  of  human  factors  that 
can  affect  the  successful  recognition  of  speech  by  a  machine 
will  then  be  summarized.  Subsequently,  the  experimental 
methodology  used  to  examine  ana  differentiate  speech 
recognition  equipment  users  will  be  presented.  T,a«tly,  the 
experimental  results  will  be  presented  and  an  analysis 
provided  of  the  correlation  of  each  variable  examined  to  its 
associated  error  rates  as  well  as  ar  analysis  cf  variance. 
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II.  COMPUTER  RECOGNITION  Cl'  SPEECH 


A.  OVERVIEW  CF  VOICE  INPUT  TECHNOLOGY 

Speech  recognition  can  he  considered  as  a  subset  of  a 
broader  field  known  as  Speech  Understanding.  Speech 
Understanding  Systerrs  (SUS)  have  the  objective  of 
interpreting  the  intent  of  the  speaker  whether  or  not  the 
user's  speech  is  grammatically  correct  or  well  formed. 
While  Speech  Recognition  Systems  (SRS?  are  primarily 
Interested  in  the  correct  recognition  of  every  word,  SUS  are 
concerned  with  the  meaning  of  entire  conversational 
segments . 

Until  now  the  only  significant  undertaking  has  been  thp 
ARPA  SUR  project  [Ref.  3],  a  five  year  effort  with  the 
objective  cf  obtaining  a  breakthrough  ir.  speech 
understanding  capability  that  would  then  allow  the 
development  of  practical  man-machine  communication  systems. 
Specifically,  the  objectives  were  to  develop  a  SUS  that 
would  accept  continuous  speech  from  many  cooperative 
speakers  of  a  gereral  American  public;  a  system  which  used 
syntactic  analysis,  semantics,  pragmatic  information  and 
prosodies  to  acquire  an  appropriate  computer  response. 

The  goals  of  speech  recognition,  in  contrast,  are  less 
ambitious.  Instead  of  abstract  concepts  such  as  meaning  cr 
understanding,  SRS  try  to  solve  the  mere  practical  problems 
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of  analyzing  the  acoustic  waveforr  ana  applying  pattern 
recognition  techniques  in  order  to  diff erentlate  between 
utterances  [Ref.  4].  Figure  1  illustrates  a  typical  speech 
recognition  model. 

The  acoustic  speech  signal  is  first  analyzed  to  extract 
such  acoustic  parameters  as  frequency  spectrum  and  the 
enersy  in  different  time  segments.  Next,  information 
carrying  features  are  extracted  that  define  various  phonetic 
everts  such  as  how  noisy  (fricative-like)  the  signal  is, 
positions  of  different  vowel-like  sounds  and  vibratior  of 
the  speaker's  vocal  cords.  This  i nf crmat 1  or  is  then  used  to 
divide  the  speech  into  time  slices  or  segments  and  are 
labelled  with  phonetic  categories.  The  phcuetic  sequence 
for  the  input  speech  is  matched  to  stored  sequences  of 
expected  pronunciations  for  the  words  in  the  lexicon  cr 
dictionary,  and  the  best  Hatching  sequences  are  determined 
to  be  the  most  likely  wcrd(s)  that  had  occurred  in  speech. 

Speech  recognition  systems  can  be  considered  as 
belonging  to  one  of  two  categories?  continuous  (connected) 
or  Isolated  (discrete)  speech  systems.  Continuous  systems 
are  these  which  can  extract  information  from  strings  of 
words  even  though  the  words  run  together  as  in  natural 
speech.  Isolated  systems  require  a  short  pause  before  and 
after  utterances  that  are  to  be  recognized  as  entities.  The 
minimum  duration  of  a  pause  is  typically  between  10P-2?e 
msec,  in  isolated  word  recognizer  is  also  limited  in  the 
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Figure  i.  Speech  Recognition  r*odei 
(Frorr  Reference  4) 


duration  of  the  spoken  utterance,  usually  2-4  seconds. 
Continuous  speech  recognizers  are  just  now  beginning  to 
appear  cn  the  market  but  are  expensive  and  their 
capabilities  and  reliability  have  yet  to  be  realistically  or 
practically  evaluated.  For  the  remainder  of  this  thesis  our 
discussion  will  be  confined  to  discrete  reccgnition  systems. 

Two  other  concepts  of  speech  recognition  to  be  discussed 
are  that  cf  speaker  independence  and  vocabulary  sire. 
Speaker  dependent  systems  are  those  which  require  speaker 
adaptation  (cr  'training')  in  crder  to  achieve  recognition. 
This  is  in  contrast  to  speaker  independent  systems  which 
will  recognize  speech  regardless  of  the  speaker.  In  terms 
of  speech  recognition  equipment  and  their  associated 
vocabularies,  most  recognizers  wcrfc  welL  with  small 
vocaouleries  of  10-50  words  [Ref.  5:  p.  £0].  ,rhp 
possibility  of  confusion  between  words  increases  as  the 
vocabulary  size  increases,  and  to  some  extent  the  chance  of 
similar  sounding  words  Increases  with  such  larger 
vocabularies . 

At  this  Juncture  it  is  appropriate  to  expand  our 
definition  of  'words'  to  encompass  more  thaD  Just  individual 
words.  As  used  herein,  'word'  is  used  interchangeably  with 
the  term  'utterance'  ana  may  be  either  a  singular  mono-  or 
polysyllabic  word  or  a  combination  of  mono-  or  polysyllable 
words  joined  into  a  phrase,  (ie.  Place-a-Circle-on-moscow) 


The  ?cur  processing  functions  [Bef .  6]  contained  in  a 
limited  vocabulary  voice  recognition  systetr,  as  shown  in 
Figure  2,  consist  of  a  transducer,  preprocessor,  feature 
extractor,  and  a  final  decision-level  classifier. 

1.  Transducer:  The  microphone  is  the  interface  between 
the  user  and  the  syster-  and  converts  the  spoken  phrase 
into  electrical  signals  that  are  analyzed  ty  the  other 
components  of  the  system. 

2.  Preprocessor:  No  matter  how  it  is  represen  ted , 

spectral  information  must  be  explicitly  or  implicitly 
contained  in  all  speech  encodings.  The  initial 
analyses  produce  parametric  representations  [Bef.  r7] 
and  take  place  in  the  preprocessor.  This  segment  of 
the  system  transforms  the  speech  signal  in  order  to 
enhance  certain  properties  and  make  them  more  easily 
detectable  in  a  speech  recognition  system.  The  signal 
is  normalized  in  time  by  dynamic  programming  for 
subsequent  comparisons  with  various  reference 

patterns.  Tata  Compression  removes  any  extraneous  cr 
irrelevant  information.  loth  time  and  frequency 

domain  analytical  techniques  are  performed  on  the 
input  signal.  Speech  analysis  is  achieved  by  either 
direct  analog  spectrum  analysis  via  fast  fourier 
transform  (iFT)  in  the  frequency  domain,  or  linear 
predictive  coding  (IPC)  in  the  time  domain. 
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Figure  2.  Processing  Functions  of  a  Speech 
Recognition  Systetr  (Fror  Reference  6) 
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3.  ieature  Extraction:  The  key  processing  functior  in  a 
pattern  recognition  system  is  the  feature  extractor. 
The  more  optimal  the  set  of  acoustical  features 
extracted  ana  sent  to  the  classifier,  the  less  complex 
the  classifier  need  be  to  achieve  a  given  accuracy 
level.  This  segment  or  the  system  produces  a  set 
number  cf  significant  acoustical  features  (depending 
on  the  individual  recognizer)  a  few  of  which  include 
spectral  slopes,  phonetic  classificaticn,  and  Initial 
estimate  of  word  boundary. 

4.  Classifier:  The  classificaticn  process  is  performed 
in  software  using  a  mi ni compu ter .  When  e  speaker 
Issues  an  utterance,  the  encoded  features  and  their 
time  of  occurrence  are  stored  in  short  term  memory. 
The  duration  of  the  utterance  is  broken  into  time 
segments  and  the  features  reconstructed  into  the 
normalized  time  base.  Reference  patterns,  previously 
input  by  the  speaker  fcr  the  system's  vocabulary  of 
words  are  compared  to  the  feature  occurrence  patterns 
and  a  'best-fit'  or  'ciosest-match '  determined  fcr  a 
word  decision.  The  number  of  bits  of  information  for 
the  feature  map  cf  each  reference  pattern  is 
determined  by  mapping  the  number  of  acoustic  features 
onto  the  cumber  of  time  segments. 
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The  first  two  processing  functions  are  accomplished  by  a 
hard  wired  preprocessor  and  feature  extractor.  This 
achieves  real-time  processing  since  only  the  classification 
function  is  performed  in  a  general-purpose  minicomputer 
[Ref.  6:  p.  177] . 

A  discrete  word  recognizer  must  he  'trained'  for 
individual  talkers  and/or  words.  This  can  be  dene  by  a  user 
simply  speaking  a  set  number  of  training  samples  into  the 
device  to  provide  a  reference  set  of  features.  The  system 
stores  in  memory  the  reference  set  of  word  features  for  each 
word  (utterance)  the  user  has  spoken.  Cnee  the  system  is 
trained,  the  user  may  speak  words  into  the  device  during 
normal  operation  and  these  are  compared  with  the  stored 
patterns.  The  'closest  fit'  is  selected  as  the  recognized 
word.  This  sequence  of  events  is  ccmmcnly  partitioned  lot r 
the  training  aDd  recognition  modes  of  operation. 

There  are  two  types  of  errors  that  can  occur  in  speech 
recognition.  The  first  is  a  rejection,  or  the  inability  cf 
the  recognizer  to  correctly  classify  an  utterance.  The 
second,  and  in  e  practical  sense  more  troublesome,  is  a 
misrecogaition .  Tbls  occurs  when  the  recognizer  classifies 
an  utterance  as  something  other  than  what  was  spoken. 
Better  recognizers  usually  have  recognition  algorithms 
designed  to  reject  rather  than  guess  at  questionable  words. 
Higher  quality  systems  such  as  Threshold  (Models  £ze  and 
680 )  have  error  rates  that  are  quite  acceptable  [Ref.  9,  9, 
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10 J  .  Extensive  experimentation  bas  shown  approximate  error 
rates  to  ce  between  .2  and  11.4  percent  (Ref.  6:  pp .  l?y- 
180].  Of  course,  what  constitutes  an  acceptable  error  rate 
Is  critically  dependent  upon  the  particular  application  and 
data  entry  rate. 

B.  THE  VALUE  CF  SPEECH  RECOGNITION 

The  Department  of  Defense  has  teen  very  active  in  tte 
past  few  years  in  their  efforts  to  assess  the  merits  cf 
voice  recognition  with  machines.  Such  locations  as  the 
Naval  Postgraduate  School,  Wright  Patterson  Air  Force  P3se, 
Rome  Air  Development  Center,  Naval  Air  Development  Center 
and  assorted  ether  agencies  ana  contractors,  have  conducted 
extensive  tests  in  order  to  examine  human  interaction  with 
machines  through  the  use  of  voice  input  and  other 

modalities.  In  order  to  comprehend  the  need  for  firther 
research  pertaining  to  voice  input  technology,  It  is 
essential  to  review  the  advantages  and  limitations  that  this 
type  of  technology  offers.  Mere  importantly ,  it  is 
essential  to  understand  its  potential  capabilities  and 
applications  in  a  military  environment.  Is  speech 
recognition  beneficial  (considering  costs  of  $200 
$80,000+),  practical,  and  usable  tc  justify  the  continued 
expenditures  of  research  and  development  funds  (6.1  and  6.4) 
and  operational  monies. 
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1 .  Advantages  of  Speech  Recognition 


Proponents  of  computer  recognition  of  speech  will 
continually  extol  the  virtues  and  unlimited  possibilities 
the  technology  offers.  In  an  abbreviated  fashion,  the  five 
general  advantages  of  voice  input  to  machines  may  be 
summarized  as  follows: 

—  Natural  c omrunica ti on 

—  Training 

—  Multimodal  communication 

—  Past  communication 

—  Irror  reduction  in  data  input 

Speech  Is  cur  most  natural  mcde  of  ccmmunlcatior  . 
It  is  a  familiar,  spontaneous  and  convenient  method  of 
expressing  one's  thoughts,  ideas,  cr  intentions.  Untrained 
users  of  voice  recognition  systems,  regardless  of  whether 
they  can  read,  write,  type  or  Keypunch,  car  all  speaK  cr 
mane  sounds.  These  characteristics  of  the  speech  input 
modality  rraice  it  applicable  for  users  at  all  general  skill 
levels,  from  systems  engineers  to  computer  operators  to  blue 
collar  workers  on  an  assembly  line. 

A  user  cf  speech  recognition  equipment  requires 
little  or  no  training.  They  have  only  to  restrict  their 
spoken  utterances  to  those  which  the  machine  can  recognize. 
In  the  case  of  discrete  systems,  isolated  words  are 
separated  by  a  short  pause  so  as  to  ease  the  location  of 
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word  Boundaries  and  word  choices  to  which  the  machine  has 
been  trained  to  recognize.  Although  this  appears  to  be 
disadvantageous ,  it  is  more  realistically  a  compromise  tc 
natural  speech  in  that  no  adverse  affects  are  caused  the 
user  in  terms  cf  operating  the  speech  recognition  equipment. 

Experimentation  [Ref.  11:  p.  60S]  has  shown  that 
speech,  instead  of  interrupting  corrmunicat icns  necessary  tc 
perform  other  tasks,  can  enable  users  to  do  these  tasks 
simultaneously  with  voice  and  thereby  reduce  or  at  a 
minimum,  not  add  tc  the  time  required  to  perform  a  complex 
task.  The  advantage  of  having  one's  hands  and  eyes  free  to 
do  other  tasks  is  perhaps  the  pivotal  point  ir  the 
determination  of  applicabili ty  cf  speech  recognition 
devices.  This  multimodal  aspect  allows  us  to  place  the 
microphone  anywhere  (headset  mounted,  hard-held,  on  a  stard) 
and  still  communicate  commands  arm  1 i  f ormat ion  .  Threshold 
Technology  even  has  a  wireless  micrcphcre  [Ref.  12]  that 
permits  extensive  mobility  while  talking  to  computers. 

The  fastest  modality  for  commit lcat  lens  by  a  human 
is  speech.  An  individual  car  speak  twice  as  fast  as  the 
average  typist  can  type  [Ref.  5:  p.  45].  This  has  been 
clearly  demonstrated  by  Ochman  and  Cbapanis  [Ref.  11]  whose 
experimental  results  showed  that  communication  via 
typewriter  or  handwriting  could  not  approach  speech  in  terms 
of  speed  or  task  efficiency.  Further  substantiation  from 
the  Naval  Postgraduate  School  [Ref.  8:  p.  2]  showed  that 


voice  entry  was  17%  faster  than  typing,  after  only  three 
hours  of  training.  Additionally,  while  speech  recognition 
accuracy  is  slightly  degraded  ty  mental  or  irotor  loading  of 
the  user  [Bef.  13:  p.  32],  voice  is  nevertheless  faster  and 
more  accurate  chan  other  input  modes  when  the  user  must 
perform  another  task  while  slmul taneously  interacting  with 
the  sp' ech  recognition  equipment  [Pef.  8:  p.  2] 

Ey  now  it  is  clear  that  speech  recognition  permits 
data  entry  directly  Into  the  computer  without  intermediate 
steps  such  as  manual  transcription  or  Keypunching  which  are 
subject  to  errcr.  Again,  research  at  the  Naval  Postgraduate 
School  has  shown  that  Ifc3%  more  errors  occurred  in  manual 
aata  manipulation  'typing)  than  ty  voice  [Ref.  8  p.  2]. 
Such  common  entry  errors  as  the  transposition  of  digits, 
which  are  usually  caused  ty  eye  movement  or  other 
distractions,  are  almost  eliminated  with  the  use  of 
automatic  speech  recognition  [Ref.  14]. 

2.  Limitations  of  Speech  Recognition 

If  a  particular  technology  was  devoid  of  errors  cr 
practical  limitations,  we  could  assume  universal  application 
and  implementation.  Although  the  advantages  of  speech 
recognition  are  seemingly  well  estatlished,  there  do  exist 
several  problems  associated  with  the  ability  to  speak  to 
machines.  These  limitations  include: 

--  User  variability 
—  Constrained  sp  ?ch 


—  Isolated  speech 
—  Breath  noise 
—  User  confusion 
—  Environmental  factors 

Speakers  exhibit  a  wide  range  of  personal 
characteristics  that  add  a  significant  measure  of  difficulty 
in  the  ability  of  a  machine  tc  recognize  speech.  A 
speaker's  sex,  geographic  origin,  and  articulation 
experience  are  just  a  few  of  the  elements  that  result  in  a 
user's  variability.  Consistency  is  also  a  key  element  in 
successful  recognition  accuracy.  A  speaker  may  talk  quite 
differently  in  training  the  machine  as  corrparel  to  when  he 
or  she  may  use  It  In  a  practical  application.  Additionally, 
physical  changes  in  the  speaker  such  as  age,  physical 
condition,  stress  (physical  or  emotional),  or  fatigue,  to 
name  a  few,  can  induce  variability  that  will  ultimately 
affect  successful  recognition  accuracy. 

An  isolated  word  recognition  system  Imposes  a 
restricted  (constrained)  vocabulary  both  in  terms  of  size 
and  content,  upon  tbe  user.  This  becomes  a  limitation  when 
we  consider  that  mqst  people  are  accustomed  to  speaking  in 
natural,  fluent  prose.  Because  of  the  limited  vocabulary, 
users  must  be  careful  of  the  types  cf  words  included  for 
recognition.  The  similarity  of  sound  structures  between 
words  (ie.  Nine  vs.  Time)  adds  a  measure  of  ccnfusicn  that 
can  subsequently  affect  overall  performance.  resign  of 
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a  vocabulary  fcr  a  particular  application  Is  an  important 
ana  controllable  factor  in  determining  the  accepta bl  11  ty  of 
voice  input  for  a  given  task. 

Because  Isolated  vcrd  recognizers  depend 
significantly  upon  the  detection  of  a  minimum  pause  between 
words,  word  boundary  detection  beccrres  perhaps  tbe  single 
most  critical  limitation.  The  usual  method  is  to  measure 
changes  in  energy  levels  [Bpf.  i]  .  An  isolated  word  Is 
detected  at  a  point  where  the  enerey  in  the  acoustic  signal 
rises  above  a  certain  threshold.  At  the  end  of  the  word, 
the  energy  drops,  and  the  resultant  silence  indicates  that 
the  utterance  is  over.  But,  energy  fluctuations  ere  not 
enough  to  detect  all  word  boundaries,  and  thus  advanced 
detection  techniques  will  hate  to  involve  detection  and 
inclusion  of  stop  consonants  within  words,  while  eliminating 
pauses  due  to  'lip-smacks'  cr  breath  noise. 

In  a  United  vocabulary,  isolated  word  recognition 
system,  breath  ncise  can  be  a  serious  problem  [Bef.  6:  p. 
174].  An  individual  wno  Is  involved  ir  little  or  no 
physical  movement  while  engaged  with  a  voice  recognition 
system  can  achieve  very  high  reccgrition  accuracy.  This 
accuracy  can  scon  deteriorate  once  the  user  begins  to  move 
around.  Inhaling  will  not  cause  ary  adterse  affects  when 
using  a  close-talking,  noise-cancelling  microphone,  but 
exhaling  will  produce  signal  levels  comparable  tr  speech 
levels.  As  physical  activity  Increases  so  does  one's 


breathing  pattern  and  as  a  result  increased  exhalation  vill 
lead  to  the  above  mentioned  deterioration  in  recognition 
accuracy. 

While  voice  input  provides  multimodal 
communications,  this  particular  advantage  has  an  inherent 
limitation  in  that  the  user  can  become  confused  as  to  what 
mode  to  use.  As  a  result,  Input  modalities  can  became 
confused,  and  interfere  with  each  other  so  that  tie  total 
rate  of  information  transfer  may  net  be  as  high  as  the  sun 
of  the  rates  possible  with  each  separate  modality. 

finally,  the  environment  in  which  the  speech 
recognition  device  is  placed  may  have  an  inadvertent  affect 
on  recognition  accuracy.  For  example,  speech  recognition  in 
an  aircraft  cockpit  may  be  degraded  due  to  engine  noise  or 
conflicting  voice  emanating  via  aircraft  radio 
communications.  Or,  consider  the  placement  of  such 
technology  in  a  crowded  military  Command  Center  where  Its 
reliability  can  be  affected  by  background  noise  from  other 
members  located  in  the  nearby  work  space. 

C.  APPLICABILITY  OF  COMPUTER  RECOGNITION  OF  SFEECH 
1 .  Commercial  Applications 

The  first  voice  input  systems  to  be  used  by  industry 
were  Installed  in  late  1972  and  early  1973  [Ref  15].  These 
early  applications  included: 

—  quality  control  and  Inspection 
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—  automated  material  handling 

—  direct  voice  input  to  computers 

Their  successful  Implementation  was  due  in  large  part  to 
recognition  accuracies  that  were  greater  than  or  equal  to 
the  manual  keying  accuracies  obtained  from  the  same 
personnel . 

In  nost  quality  control  and  inspection  processes  the 
Inspector's  hands  and /cr  eyes  are  occupied  in  the  inspection 
task.  Through  the  use  of  a  voice  recognition  system  it  is 
possible  tc  combine  the  inspector's  normal  work  requirements 
with  the  simultaneous  entry  of  all  data  measured  and 
observed.  Owens-Illinois  Corporation  installed  voice  data 
entry  equipment  in  early  iy?3  for  the  inspection  of  color 
television  faceplates.  Here  was  an  application  where  the 
inspector  "had  to  manipulate,  orient,  and  measure  parameters 
using  gauges  and  meters".  The  requirement  to  simultaneously 
record  the  measurement  data  also  existed.  In  this  example 
the  operator  was  able  to  achieve  both  tasks  at  once  [Ref.  6: 
pp.  ie2-183] . 

Voice  entry  has  been  utilized  in  recent  years  to 
control  the  movement  of  materials  such  as  parcels, 
containers,  baggage  etc.  through  distribution  end  sorting 
centers.  A  voice  controlled  package  routing  system 
Installed  by  SS  Xresge  in  November  iy74  allowed  just  one 
operator  to,  handle  each  item,  read  the  label,  and  speak  the 
destination  code  for  each  carton  into  his/her  microphone. 
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Formerly  this  had  been  an  operation  that  required  two 
persons  and  still  resulted  in  the  'bunching'  up  of  different 
size  packages.  Following  the  installation  of  voice 
activated  sorting  equipment,  the  bunching  problem  was 
eliminated,  productivity  increased,  and  sorting  errors 
reduced  [Ref.  6:  f-  165] 

2 .  military  Applications 

These  applications  may  be  placed  in  the  general 
categories  of,  equipment  and  process  control,  field  data 
entry,  data  management,  and  cooperative  man-  machine  tasks. 
A  more  definitive  classification  was  proposed  by  Seek  et. 
al .  in  is*77  [Ref.  16]  tc  include  the  general  areas  of 
Security,  Command  and  Control,  Data  Transmission  and 
Communication  and  Processing  Distorted  Speech.  Taole  I 
provides  a  recapitulation  of  military  tasks  that  could  be 
considered  for  speech  recognition  technology. 

Cf  particular  interest  is  the  use  of  speech 
recognition  fcr  Command  and  Control  applications.  The  term 
C3,  Command,  Control,  and  Communications ,  refers  tc  an 
overall  system  comprised  as  a  minimum  of  these  key  elements. 

a.  Command  Authority:  The  commander  provides  tte  central 
authority,  unity  of  purpose,  and  the  overall  concept 
as  to  how  operations  will  be  conducted  to  accomplish 
mission  objectives. 
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TABLE  I 


MILITARY  APPLICATIONS  FOB  SPEECH  RECOGNITION 
(From  Reference  16) 


I. 

SECURITY 

A. 

Speaker  Verification  (authentication) 

B. 

Speaker  Identification  (recognition) 

C. 

Determination  of  emotional  effects  (ie.  stress) 

D. 

Recognition  of  spoken  codes 

I. 

Secure  access  voice  identification 

F. 

Surveillance  of  communication  channels 

II. 

COMMAND  AND  CONTROL  ! 

i 

A. 

System  control  (ships,  aircraft,  situation  ! 

displays,  etc.)  ! 

B. 

Voice  operated  computer  input  /output  ! 

C. 

Data  handling  and  record  control  ! 

E. 

Material  handling  (mail,  baggage,  publications)  ! 

E. 

Remote  control  (hazardous  materials)  ! 

S . 

Administrative  record  control  ! 

III. 

DAT 

A  TRANSMISSION  AND  COMMUNICATION  I 

A. 

Speech  synthesis  ! 

B. 

Vocoder  systems  1 

C. 

Bandwidth  reduction  ! 

r. 

Ciphering/coding/scrambling  i 

1 

IV. 

PROCESSING  DISTORTED  SPEECH  j 

A. 

Diver  speech 

B. 

Astronaut  communication 

C. 

Underwater  telephone 

D. 

Oxygen  mask  speech 

E. 

High  'G '  force  speech 

d.  Organization :  This  element  provides  the  pathways 

through  which  the  plans,  priorities,  and  directives  of 
the  commander  are  provided  to  the  force  and  through 
which  information  pertaining  to  the  forces  can  be 
provided  the  central  authority.  These  pathways  are 
found  at  each  echelon  in  the  fcrm  cf  command  pests, 
operations  centers,  or  command  centers. 

c.  Communications :  This  provides  the  rears  for 

transmitting  plans,  priorities,  and  orders  to  elemerts 
of  the  force  and  the  means  by  which  the  forces  ray 
inform  the  Commander  cf  their  activities  and  needs. 

d.  Information:  A  ney  element  that  facilitates  control 

by  confronting  the  Commander  with  only  that 

Information  required  to  support  the  decision-making 
process.  Information  supports  both  the  staff 
planning  and  command  decision-making  process  at  all 
levels . 

The  command  centers  that  will  provide  the  requisite 
organizational  framework,  perform  several  vital  functions 
for  the  Commander,  first,  is  the  capability  tc  communicate 
securely,  and  preferably  ty  voice  over  a  wide  choice  of 
circuits.  Secondly,  each  command  center  has  the  task  cf 
integrating  information  which  comes  from  its  supporting 
elements.  A  third  capability  provided  by  these  centers  Is 
the  processing  and  display  of  information.  The  fourth 
function,  associated  with  number  three,  is  the  quick  and 
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accurate  dissemination  of  information,  reports,  and 
directives  for  tbe  Commander. 

We  are  particularly  interested  in  the  function  cf 
information  processing  and  dissemination  as  It  pro»ides  a 
suitable  application  for  computer  recognition  of  speech. 
Command  center  automation,  resulting  in  more  efficient 
communications,  will  lead  to  increased  productivity .  In  its 
broadest  sense,  communication  is  the  management  of 
information,  and  information,  not  paper,  is  the  chief 
product  of  the  command  center.  Cur  C3  systems  that  are 
designed*  ana  fielded  for  these  centers,  end  speech 
recognition  as  *  a  component  of  such,  can  provide  cur 
Commanders  the  capability  to  "observe",  "decide",  "act",  ana 
"react"  with  speed,  decisiveness  ana  accuracy. 

Navy  feasibility  studies  sponsored  by  Naval 
Electronics  Command  and  conducted  by  Dr  G.K.  FcocK  of  the 
Naval  Postgraduate  School,  examined  the  potential  for  voice 
data  entry  for  Command,  Conti ol,  and  Communications.  Two 
voice  recognition  systems  were  installed  In  lb>80  at  Fleet 
Headquarters,  Commander-in-Chi  ef  Pacific  (ClfJCP/CfLT)  in 
Hawaii  to  examine  the  benefits  and  limitations  of  voice 
Input  for  operation  of  the  Worldwide  Military  Command  and 
Control  Time-Sharing  System  (WWMCCS  TSS)  and  the  Ocean 
Surveillance  Intelligence  System  (CSIS)  [Ref.  17:  p.  24]. 
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Poock  has  also  demonstrated  that  using  voice  inrut 
to  exercise  a  typical  scenario  on  the  ARPANIT,  an 
experimental  network  since  1969  employing  packet  switching 
technology  and  connecting  over  150  host  computers,  was 
significantly  faster  and  more  accurate  than  entering  the 
commands  manually  [Kef  6].  Twenty-four  subjects  fallowed  a 
fixed  scenario  of  instructions  where  they  accessed  the 
ARPANET,  logged  Into  different  host  computers,  read 
messages,  sent  messages,  read  files,  transferred  files 
between  host  computers,  deleted  files  and  interconnected 
host  computers.  Simulated  command  centers  cperatlrfr  on  this 
network  include  the  Naval  Postgraduate  School  (Monterey, 
California),  Naval  Ccean  Systems  Center  (Sar  Diego, 
California)  and  CINCPACFLT  (Hawaii). 

Automatic  speech  recognition  has  also  teen  found  to 
have  considerable  potential  for  imagery  interpretation  and 
intelligence  report  generation  [Her.  1?:  p.  49]  .  A 
significant  amount  of  research  has  been  performed  for  the 
Defense  Mapping  Agency  (DMA)  for  such  applications  as  voice 
data  entry  for  the  processing  of  Digital  landmass  System 
(DIMS)  data,  preparation  of  Flight  Information  Publication 
(FLIP)  data  end  ocean-depth  measurements  for  digitized 
cartographic  applications.  In  all  these  applications  the 
environment  is  such  that  the  operator's  hands  are  busy  and 
frequently  involve  the  use  of  stereo  optics  and  other 
special  devices.  Tolce  has  been  shewn  experimentally  to  be 


faster,  easier,  and  a  less  fatiquing  mode  of  data  entry  than 
historically  {rcre  conventional  means  [Ref.  17:  p.  3?J  .  more 
recently,  the  feasibility  and  advantages  of  voice  input 
technology  were  described  for  use  in  the  COINS  Network 
Control  Center  (CNCC).  The  Community  On  Line  Intelligence 
System  interconnects  on-line  information  storage  and 
retrieval  systems  located  at  a  number  of  locations  within 
the  United  States  intelligence  community  [Ref.  IS] . 


III.  HUMAN  FACTORS  IN  SPEECH  RECOGNITION 


A.  DEFINITION  AND  PURPOSE 

Human  factors  is  concerned  with  improving  the 
productivity  of  the  user  by  taking  into  account  human 
characteristics  in  the  design  of  a  system.  As  described  by 
Huchingson  [Ref.  ly:  p.  4j , 

The  term  "human  factors"  is  more  comprehensive,  covering 
all  biomedical  and  psychosocial  considerations  applying 
to  man  in  the  system.  It  includes  not  only  human 
engineering,  but  also  life  support,  personnel  selection 
ana  training,  training  equipment.  Job  performance  alas, 
ana  performance  measurement  and  evaluation. 

The  people  referred  tc  in  this  definition  are  those  who 

typically  operate,  maintain  or  service  the  system.  They  are 

those  who  will  irteract  with  the  system's  design.  Vhen  the 

focus  is  on  a  broader  interpretation  it's  appropriate  to 

speak  of  a  Human  Factors  Subsystem  or  Personnel  Subsystem  as 

was  described  earlier. 

Human  factors  engineering  deals  principally  with  the 
many  factors  involved  in  the  design  of  a  new  system  -  from 
hardware  to  personnel.  For  our  efforts  in  this  analysis, 
the  current  technology  has  been  determined  to  be  acceptable 
and,  experimentally  as  well  as  operationally  reliable  for 
its  use  in  a  Command  and  Control  environment.  New,  user 
variability  is  to  be  investigated  further  it  terms  of  how  it 
affects  recognition  accuracy. 
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Since  energy  in  a  speech  signal  is  usually  displayed  in 
terms  of  frequency,  intensity  and  tire,  it  would  seem 
plausible  that  each  wcrd  should  have  a  unique  acoustic  wave 
pattern  and,  if  so,  word  recognition  would  he  a  simple 
matter  of  the  voice  recognition  system  scanning  the  pattern, 
comparing  the  simple  pattern  with  a  data  hank  of  reference 
word  patterns,  and  deciding  which  word  was  spoken. 
Unfortunately,  human  »ariahlllty  messes  up  this  uniquely 
simplistic  approach.  Our  purpose  then  is  to  discuss  the 
human  as  a  component  in  a  complex  system  designed  by  humans 
and  to  note  the  fundamental  advantages  and  limitations  of 
the  human  in  relation  to  an  automated  voice  recognition 
system. 

B.  FACTORS  AFFECTING  RECOGNITION  ACCURACY 
1 .  General 

Limitation  of  vocabularies  to  10£  words  have 
resulted  In  Identification  accuracies  cf  between  98?  -  99% 
in  a  controlled  laboratory  environment.  In  an  operational 
or  field  setting  recognition  accuracies  have  been  reported 
as  low  as  £0%  [Ref.  20:  p.  £26] .  Various  factors  noted  for 
interfering  with  successful  identification  have  Included 
background  noise,  inconsistent  microphone  placement, 
insufficient  training,  inconsistent  speaking  style,  and  the 
lack  of  user  cooperation.  Lea  in  a  paper  titled  "What 
Causes  Speech  Recognizers  to  Make  Mistakes?"  [Ref.  21]  calls 


for  the  determination  of  those  factors  that  influence 
recognition  accuracy  rather  than  the  repeated  assessment  of 
transitory  devices.  Table  2  summarizes  the  four  'dimensions 
of  difficulty'  Dr  Lea  has  proposed.  What  needs  to  be 
accomplished  is  the  characterization  cf  the  relative  effects 
of  changes  along  each  of  these  four  dimensions,  or  more 
simply  stated,  find  the  factors  influencing  the  accuracy  of 
machines  that  recognize  speech. 

Because  there  are  so  many  variables  involved  that 
affect  recognition  accuracy,  the  list  in  Table  2  may  be 
reorganized  in  a  "communication-theoretic"  framework.  This 
framework  models  the  speech  recognition  error  rate  as  a 
function  cf  seven  complex  sets  of  factors  [Ref.  6:  pp.  69- 
93]  that  include: 

—  Task  factors 
—  Human  Factors 
—  Language  Factors 
—  Channel  and  Environmental  Factors 
—  Algorithmic  Factors 
—  Performance  Factors 
—  Response  Factors 

It  is  the  set  of  Human  Factors  that  this  experiment 
and  analysis  is  principally  concerned  with,  for  It  is  this 
stage  of  the  model  that  has  a  major  impact  on  speaker 
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TABLE  II 

DIMENSIONS  OF  DIFFICULTY  FOR  SPEECH  RECOGNITION 
(From  Reference  5) 


: 

1  . 

Eorm  of  speech  to  be  recognized 

TASK  AND 

. 

2. 

Accuracy  requirements 

PERFORMANCE 

. 

3. 

Required  throughput  rates 

REQUIREMENTS 

■ 

4. 

Type  of  device  necessary 

a 

== 

■ 

1. 

Sex 

2. 

Dialect 

3. 

Vocal  tract  size 

4. 

Vocal  cora  characteristics 

5. 

Pronunciation  habits  of  speaker 

HUMAN 

6. 

Physical  state 

VARIABILITY 

7. 

Psychological  state 

8. 

Workload 

9. 

Coopera  ti veness 

10. 

Time  of  day /week 

11. 

Time  since  training 

12. 

Number  of  training  sarples/worl 

« 

13. 

i 

1. 

2. 

Word  length 

3. 

Word  sound  structure 

4. 

Confusatility 

LANGUAGE 

C. 

w  • 

lang"age  spoken 

DIFFICULTIES 

6. 

Syntactic,  semantic,  and 

pragmatic  constraints 

7. 

Enhanceaolli ty 

j 

8. 

Stress  Pattern 

y . 

Intonationai  variability 

10 . 

Rhythm,  and  timing  variability 

=  ==*=  =  =  ===:  =  =  =  *  = 

a 

== 

4 

l. 

2. 

Type(s)  of  noise 

3. 

Bandwidth 

4. 

Spectral  distortions 

ACOUSTIC 

5. 

Transducer  characteristics 

DIFFICULTIES 

6. 

Placement  of  the  transducer 

7. 

Amplitude 

8. 

V i orati cn 

9. 

Acceleration 

variability.  This  set  of  huiran  factors  can  be  further 
subdivided  [Bef.  21:  p.  2]  in  order  to  ironitor  their 


influence  on  recognition  error  rates.  A  fev  of  these  are 
listed  below: 


—  Speaker  Experience 

—  Training  Method 

—  Sex  of  the  Speaker 

—  Physical  Dimensions  of  the  Speaker 

—  Geographic  Origin  of  the  Speaker 

—  Speaker  Dialect 

—  Physical  State  of  the  Sjeaxer 

—  Psychological  State  of  the  Speaker 

—  Speaker  Cocperativeness 

—  Tiire  of  Day  or  keek 


Because  different  speakers  nay  demonstrate  widely 
varying  irethoas  of  pronouncing  words  or  phrases,  the  above 
listed  factors  nay  be  further  separated  into  two  categories; 
those  occurring  between  speakers  and  these  affecting  each 
individual  speaker.  First,  some  of  the  differences  between 
speakers  that  induce  variability  will  be  briefly  examined 
and  then  the  variabilities  apparent  within  each  speaker  that 
can  affect  recognition  accuracy  will  be  discussed. 

2 .  JCiJ^er£ncesB_B£tw£ien_>SigeaKeirs^ 

Speaker  Experience:  This  factor  can  take  on  a  two¬ 
fold  meaning  when  looking  at  it  as  a  source  of  variability. 
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First  is  the  experience  of  using  voice  recognition 
equipment.  Experienced  voice  recognition  users  should  be 
expected  to  have  a  higher  and  more  reliable  recognition 
accuracy  than  those  who  are  'naive'  to  the  technology. 
These  experienced  users  are  comfortable  using  the  equipment, 
less  likely  to  be  intimiaated  by  the  system,  and  are 
familiar  with  its  performance  capabilities  from  previous 
usage.  The  other  meaning  of  speaker  experience  has  tc  do 
with  job  skill.  Can  a  user  who  operates  in  a  microphone 
environment  on  a  daily  cr  regular  basis,  such  as  an  Air 
Traffic  Controller  or  a  Pilot,  be  expected  to  have  better 
recognition  rates  than  those  vhc  have  never  spoken  Into  a 
micropnone?  A  data  processor  who  works  regularly  in  an 
environment  demanding  precise  data  entry  by  keyboard  might 
have  the  type  of  experience  or  skill  factor  that  would 
provide  an  edge  over  a  prospective  user  possessing  only 
basic  typing  skills.  This  type  cf  experience  overlaps 
slightly  with  speaker  cooperat lveness  and  will  be  elaborated 
upon  later. 

method  of  Training:  The  ideal  form  of  voice 
interaction  would  be  for  a  user  to  pick  up  the  microphone, 
speak  commands  the  machine  can  understand,  and  for  the 
appropriate  response  to  take  place.  Naturally,  this  is  the 
goal  of  speaker  independent  systems,  but  since  humans  all 
speak  differently  and  our  form  of  speech  recognizer  is 
discrete,  we  are  mandated  to  provide  the  machine  some 


information  about  how  we  speak  each  word  intended  for  our 
desired  vocabulary  (ie.  Training).  The  method  by  which  the 
machine  is  trained  by  the  user  will  in  large  part  dictate 
subsequent  recognition  accuracy.  If  the  user  is  closely 
supervised  and  made  to  carefully  speak  the  particular 
vocabulary  then  we  should  be  able  to  expect  higher 
recognition  rates  as  opposed  to  the  user  who  is  given 
cursory  instructions  on  the  use  of  the  equipment  ard  allowed 
to  go  on  independent  of  further  supervision  during  the 
training  mode.  An  adjunct  of  training  method  is  the  number 
cf  training  'samples'  cr  pronunciation  pattern.  It  is 
difficult  tc  achieve  accurate  speech  recognition  when  the 
number  cf  training  passes  per  word  is  small  or  smaller  than 
manufacturer  specifications  [Bef.  22].  Using  identical 
equipment,  it  would  still  he  reasonable  to  anticipate  some 
speakers,  having  had  a  lesser  amount  cf  training  samples  per 
word,  navlcg  mere  success  than  others  who  have  bad  mere 
samples  per  word. 

Sex:  male  voices  have  lower  frequencies  than 
females  and  a  more  detailed  spectral  structure  results  from 
the  lower  pitch  of  their  voices.  This  detailed  structure  is 
more  indicative  of  the  vocal  mechanism  and  or  the  Intended 
vowels  and  consonants  spoken.  male  voices  tend  to  fare 
better  with  recognizers  employing  frequency  domain  analysis 
while  female  voices  tend  to  have  greater  success  with 
machines  using  time  domain  analysis  [Bef.  5].  A  recent 
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comparison  was  conducted  [Ref.  22]  which  revealed  no 
statistically  significant  difference  between  the  seies. 
Although  not  a  primary  objective  of  the  thesis,  it  regains  a 
source  of  variability  that  merits  some  measure  of  analysis. 

Speaker  Dialect:  Dialects  not  only  affect  the 
specific  sound  produced  for  each  vowel  or  consonant  type, 
but  also  exhibit  different  dynamics  of  speech  production. 
For  example,  Southerners  have  their  readily  identifiable 
drawl,  whereas  a  New  Yorker  will  tend  to  say  "Told"  rather 
than  "Third"  and  residents  of  Cambridge,  Massachusetts  can 
be  heard  to  talk  about  "Hahvahd"  instead  of  "Harvard". 

Physical  Dimensions:  Throughout  the  literature  on 
speech  recognition  one  will  see  speaker  variability 
attributed  to  a  variety  of  factors,  none  cf  which  include 
the  physical  dimensions  of  tte  speaker.  An  examination  of 
the  recognition  accuracy  for  a  selected  sample  population 
based  on  physical  dimensions  would  provide  an  interesting 
insight  into  the  ramifications  of  such  a  factor  as  a 
component  within  a  personnel  selection  subsystem.  In  other 
words,  what  effect,  if  any  will  height  and  weight  have  on 
recognition  accuracy? 

Geographic  Origin:  This  particular  factor  is 
multidimensional  consisting  of  several  sub-factcrs  which 
require  careful  examination: 

—  Place  of  birth 

—  Geographic  area  of  upbringing 


—  Ethnic  background 
—  Religious  preference 

The  above  may  impose  ideosyncratic  or  social  differences  in 
habits  which  can  produce  variations  in  sound  and 
subsequently  in  pronunciation.  These  suD-4actors  all 
contribute  a  measure  of  variety  that  can  presumably  affect 
recognition  accuracy. 

3.  Differences  Within  Speakers 

Physical  State:  The  present  physical  state  of  a 
user  of  voice  recognition  equipment  can  precipitate 
variability  in  his  or  her  voice,  for  example,  a  cold,  seme 
form  of  pathological  condition,  fatigue  etc.  can  alter  the 
speaker's  voice.  The  individual's  vciee  quality  could  oe 
different  based  on  physical  conditioning.  Is  the  user  who 
works  cut  regularly  and  stays  in  excellent  physical 
condition  more  likely  to  show  higher  recognition  rates  than 
one  who  rarely  exercises,  smokes  regularly  and  generally  Is 
not  in  the  test  cf  health? 

Psychological  State:  Spielterger  iRef.  23:  p.  2y] 
defines  transitory  or  state  anxiety  as  a  complex,  unique 
emotional  condition  that  can  vary  in  intensity  and  fluctuate 
over  time.  State  anxiety  may  be  thought  of  as  consisting  cf 
unpleasant,  consciously  perceived  feelings  of  tension  and 
apprehension  with  an  accompanying  activation  or  arousal  of 
the  autonomic  nerveus  system.  The  concept  cf  trait  anxiety 
refers  to  the  relatively  stable  i rdi vidua  1  differences  In 
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anxiety  proneness.  It  may  also  be  a  reflection  on  the 
frequency  and  Intensity  with  which  state  anxiety  has  been 
previously  manifested  and  the  probability  that  such  anxiety 
will  occur  in  the  future  [Ref.  23:  p.  39] .  The  fact  that 
physiological  functioning  is  affected  during  periods  of 
anxiety  is  easily  apparent.  The  degree  to  which  speakers 
deal  with  a  state  or  trait  anxiety  may  well  re  a  significant 
variable  of  consideration  in  the  examination  of  error  rates 
of  voice  recognition  systems. 

Speaker  Cooperativeress:  Hcv  enthusiastic  ani/or 
willing  a  speaker  is  toward  the  use  of  voice  recognition 
equipment  could  induce  speaker  variability  and  hence 
subsequent  recognition  accuracy.  In  a  military  environment 
where  many  Job  positions  are  cf  a  non-voluntary  variety,  it 
is  conceivable  to  expect  the  selection  of  voice  recognition 
users  who  are  tola  to  operate  the  equipment  regardless  of 
their  personal  preferences.  If  the  user  distrusts  the 
technology  or  prefers  manual  entry,  and,  is  still  required 
to  use  voice,  we  have  developed  a  non-cropera ti ve  user.  A 
non-cooperative  user  is  therefore,  one  who  is  consciously 
trying  to  undermine  the  successful  operation  cf  the  machine. 
The  cooperative  user  is  one  who  is  willing  to  help  the 
machine  by  saying  precisely  what  the  machine  wants  and 
pronouncing  it  in  a  clear  and  consistent  manner.  There  is  a 
certain  grey  area  surrounding  this  factor  with  the  presence 
of  users  who,  although  not  consciously  trying  to  confuse  the 


device,  are  not  fully  committed  to  "helping  the  machine"  to 
recognize  the  correct  utterances. 

Tirre  of  Day/Veek:  Isch  person's  speech  is  variable 
depending  upon  time  of  day,  changing  from  mcrning  tc  evening 
and  even  changing  progressively  over  a  period  of  time  fBef. 
5].  An  examination  of  recognition  performance  ever  extended 
periods  of  time  [Ref.  24:  p.  lj  shovel  a  statistically 
stable  performance  over  time  (21  weeks)  with  no  serious 
degradation  occurring  as  time  elapsed.  Nevertheless  a  user 
who  has  a  gap  in  time  between  training  and  operational  use 
may  forget  any  special  ways  he/she  trained  the  machine.  Rev 
much  of  a  gap  is  tolerable  is  a  subject  for  future  research. 

4.  miscellaneous  Factors 

Some  additional  human  factors  that  have  been 
proposed  [Ref.  5]  deserve  a  brief  description.  They  have 
been  relegated  to  a  separate  section  because,  for  one  reason 
cr  another,  lack  of  equipment,  current  technical  skills, 
lack  of  measurable  quantitative  lata  etc.  experimental 
examination  at  the  present  time  has  beer,  precluded.  These 
factors  include: 

—  Form  of  speech 
—  Speaker  dependence 
—  Rate  of  speech 
—  Vocal  tract  size 
—  Speaker's  glottal  spectrum 
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lorir  of  speech  refers  to  the  type  of  voice 
recognition  system  to  oe  used,  isolated  or  continuous. 
Continuous  systems,  being  a  quantum  step  above  isolated  in 
terms  of  complexity,  bring  about  a  greater  opportunity  fcr 
speaker  variability  tc  manifest  itself.  Such  things  as 
detection  of  word  boundaries,  slurring  of  speech  (le.  "dlja” 
vs  "did  you"),  and  prosodic  characteristics  could  seriously 
affect  recognition  accuracy  because  of  these  types  of 
complications  which  a  continuous  speech  recogr.it ion  system 
introduces . 

A  speaker  independent  system  negates  the  requirement 
for  training  and  thus  variability  betweec  speakers  becomes  a 
more  critical  factor  for  independent  systems  to  contend 
with.  Independent  recognizer  performance  will  have  to  he 
tailored  to  accommodate  an  unlimited  number  of  potential 
speakers  ana  their  associated  variability. 

The  faster  a  person  speaks  the  more  likely  that  the 
expected  pronunciation  will  be  altered  due  to  slurring, 
deleted  syllables,  etc..  If  a  machine  is  trained  tc  one 
form  of  pronunciation  ana  at  one  particular  rate  of  speech, 
a  differing  rate  in  an  application  mode,  will  cause  an 
Increase  in  recognition  difficulty.  With  an  isolated  word 
recognizer  tc  be  used  In  the  experimentation,  requiring  a 
minimum  of  100  msec  pause  between  utterances,  and  utterances 
not  exceeding  2.0  seconds  in  duration,  this  particular 
factor  was  not  considered  essential  to  the  overall  analysis. 


It  is  rather,  an  important  factor  in  terrrs  of  continuous 
recognition  systems. 

The  size  of  the  vocal  tract  will  produce  changes  in 
the  forrants  of  the  speech  signal;  the  srraller  the  vocal 
tract  the  higher  the  formants.  This  can  have  an  impact  cn, 
for  example,  transmission  through  limited  bandwidth 
channels.  Vccal  ccrd  character! sties  also  predice 
interspeaxer  variability  such  as  pitch  or  "resonant"  quality 
of  the  voice.  Speakers  with  more  'resonant’'  voices  that 
project  well,  will  be  easier  for  recognizers  to  handle  [Bef. 


IV.  INSCRIPTION  Of  THE  EXPER IPEf.T 


A.  OBJECTIVES  AND  CONSTRAINTS 
l .  Jlb^ectWes^ 

As  noted  earlier,  our  overall  objective  was  to 
examine  the  human  as  a  component  in  a  complex  system.  In 
narrower  terms,  this  experimentation  attempts  to  assess  the 
affect  of  differing  occupational ,  operational,  personal, 
physiological,  and  psychological  characteristics  of  a  user, 
on  the  accuracy  with  which  a  currently  availatle  voice 
recognition  system  will  correctly  interpret  spoker 
utterances.  Subsequently,  our  discussion  will  address  the 
occurrence,  if  any,  of  existing  quantitative  parameters  that 
would  enable  us  to  dlf ferentiate  between  effective  and  non- 
effective  users  of  voice  recognition  systems. 

The  following  specific  characteristics  are  examined 
in  this  thesis,  many  of  the  individual  characteristics,  or 
human  factors,  are  self-explara tcry  while  others  are 
provided  with  a  brief  explanation  and/or  rationale  for 
selection. 

a.  Occupational  Characteristics 

This  set  of  parameters  examines  the  possible 


effect  on  recognition  accuracy  due  to  differences  inherent 
in  a  user's  occupational  skill  or  Jot  (military  or  civilian) 
background.  Specific  characteristics  include: 


Job  function:  Comparison  of  recognition  rates 
between  microphone  experienced  users  (ie.  pilots, 
air  traffic  controllers)  and  non-exper ienced  users. 

Branch  of  service:  A  factor  with  posslcle 
consequences  pertaining  to  its  use  in  personnel 
selection  criteria. 

Job  satisfaction:  A  subjective  evaluation  by  the 
user  as  to  his/her  job  satisfaction  in  their  current 
duty  assignment  ana  their  satisfaction  within  the 
Armed  Services. 


Previous  computer  experience:  Computer  experienced 
personnel  (le.  Eata  Processors)  are  expected  to 
have  a  better  appreciation  fcr  the  advantages  of 
voice  input  ana  thus,  be  more  conscious  of  the±r 
efforts  ana  positively  motivated  fcr  higher 
recognition  accuracy. 

Foreign  language  competency:  Frequently  military 
and  civilian  members  associated  with  ECU  are 
required  to  possess  the  capability  to  fluently  spealc 
a  foreign  language.  This  ability  is  another  factor 
that  could  affect  one's  speech. 

b.  Operational  Characteristics 

This  set  of  parameters  examines  the  possible 
effect  on  recognition  accuracy  due  to  factors  surrounding 
the  operational  use  of  voice  recognition  equipment. 
Specific  characteristics  include: 
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—  Training  method:  Analysis  of  recognition  rates  for 
those  users  who  are  supervised  during  the  training 
irode  compared  to  those  who  are  allowed  to  train  the 
equipment  individually. 

—  Time  of  day  and  week:  A  determination  of  whether 
the  tire  frame  in  which  a  speaker  trains  the 
recognizer  will  have  ary  subsequent  affect  cn 
recognition  accuracy. 

Equipment  experience:  Comparison  cf  recognition 

rates  between  experienced  users  of  voice  recognition 
equipment  and  tnose  who  have  never  used  the 
equipment  before  ('naive'  users). 

Ease  of  use:  The  operational  simplicity  of  the 

equipment  could  affect  a  speaker's  performance.  For 
example,  a  speaker  who  considers  the  recognizer  as  a 
complex  and  operationally  difficult  device  will  te 
less  likely  to  devote  his  or  her  maximum  effort  to 
their  performance. 

c.  Personal  Characteristics 

The  following  are  various  characteristics 
considered  to  have  a  possible  effect  on  an  individual's 
speech  patterns,  and  hence,  affect  the  recognition  accuracy 
of  a  voice  system.  These  parameters  Include: 

—  Race 

—  Pantai  status  and  family  size:  A  correlate  of 


psychological  state  and,  although  equally  likely  tc 
be  included  as  a  psychological  characteristic,  it  is 
considered  here  as  a  criterion  for  personnel 
selection,  family  size  refers  to  the  number  of 
offspring  the  user  has  as  opposed  to  the  size  fairily 
in  which  one  was  raised. 

Beligious  pref erence/Ithni e  background 
—  Accent  or  dialect 
—  Place  of  birth/geograpni c  origin 
—  Level  of  education 

—  Socioeconomic  class:  similar  in  nature  to  the 

characteristic  of  marital  status  but  is  considered 
fcr  its  merit  in  selection  rf  personnel  than  for  its 
affect  on  individual  speech  patterns. 

—  Cental  or  crthoaontal  care:  Braces,  corrections  for 
improper  Dite,  or  major  oral  surgery,  are  considered 
for  their  implication  on  the  speech  patterns  of 
those  individuals  and  the  resultant  error  rate. 

d.  Phys io logical  Characteristics 

These  characteristics  are  also  considered  to 
have  an  affect  on  speech  ana  as  a  result  are  factors  of 
interest  when  examining  recognition  accuracy  and  speaker 
variability.  These  parameters  include: 


56 


Heigat 

Weight 


—  Age 

—  Physical  condition:  A  subjective  evaiuaticr  by  the 
user  of  his/her  current  physical  condition. 

Rate  oh  airflow:  beasurerent  of  ventilatory 

furcticn  to  provide  a  diagnosis  of  condition 
directing  voice.  This  treasurer ent  can  also  ce  used 
as  an  indication  cf  pcssicie  airway  obstruction. 
Vital  capacity;  The  iraiiir.cn  arrount  of  volume  of  air 
which  can  ce  exhaled  following  maximum  inhalation. 
This  measure  provides  an  estimate  of  the  amount  of 
air  potentially  available  for  the  production  cf 
phonalion . 

Speech  training:  Examines  whether  formal  speech  or 
voice  training  affects  recognition  accuracy. 

e.  r sychoxogicai  Characteristics 

The  current  psychological  state  cf  a  user,  their 
coop  era  liven ess,  ana  their  personal  attitudes  toward 
automation  and  voice  aii  contribute  toward  the  overall 
affect  on  recognition  accuracy.  The  particular  parameters 
investigated  induce: 

—  Psychological  amiety 
Speaker  cooperativeness 

—  Affect  of  errors  on  suosequent  performance 
—  Attitudes  toward  voice  recognition  equipment  as  a 
time  saving  Job  ala 
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Attitudes  towards  computers  and  data  automation 


In  effect,  items  4-6,  are  related  to  speaker  cooperati ver.e ss 
in  that  how  a  user  feels  about  computers  ana  voice 
recognition  could  impact  on  their  willingness  to  reliably 
support  the  use  of  voice  recognition  equipment. 

2 .  ^on^r^n^s 

Accomplishment  of  test  objectives  were  constrained 
within  the  research  facilities  or  the  Naval  Postgraduate 
School.  In  the  interest  or  time,  experimentation  was 
limited  to  five  weeks. 

because  voice  production  is  an  extremely  complex 
event  in  whicn  auditory,  acoustic,  er.u  aerodynamic  events 
are  produced  Dy  the  interaction  ef  physiological  mechanisms, 
it  wcuIq  ce  beneficial  if  we  could  measure  as  many  vocal 
parameters  as  possible  in  order  tc  achieve  a  complete  and 
accurate  picture  of  voice  production,  its  associated 
variability  among  speakers,  ana  its  correlate  to  voice 
recognition  accuracy.  Lack  cf  equipment,  time,  and/'cr 
expertise  precluded  examination  of  such  factors  as: 

—  Glottal  waveform 

—  Transfer  function  of  the  vocai  tract 
—  Sound-pressure  level 

—  Maximum  duration  of  sustained  phonation 
—  Maximum  frequency  levels 
—  hcdai  frequency  level 


3.  SUBJECTS 

Forty-four  subjects  participated  in  the  experiment  on  a 
volunteer  oasis.  The  group  was  composed  of  It  military 
officers,  17  military  enlisted,  and  I  civilians.  The 
military  officers  representing  the  Army,  Air  force  ar.d  Navy 
consisted  of  li  pales  ana  4  females  while  tne  enlisted 
personnel  representing  the  Army  and  Navy  consisted  of  11 
ma.'es  ana  €  fenaies.  The  civilians  included  a  professor  from 
the  NPS  Oceanography  Department  and  an  employee  cf  the 
Defense  manpower  Date  Center  (LmEC)  in  Ponterey.  The  rank 
cr  grace  of  the  military  subjects  ranged  from  0-2  to  C-4  for 
the  corml  ssicnea  officers,  LWI  to  CW3  for  the  Warrant 
Officers,  and  £3  tc  £7  for  the  enlisted  personnel.  The 
subjects  ages  ranged  from  kZ  to  47,  with  an  average  age  or 

It  was  atsirea  that  the  speajters  selected  for  the  test 
oe  representative  cf  the  population  for  which  the  recognizer 
is  tc  ce  used,  in  cur  case  a  Command  and  Control  environment 
ana  m  particular,  a  military  command  center.  Subjects 
taxing  part  in  the  experiment  were  representative  cf  this 
environment  as  shown  by  the  grade  distribution  ano  types  of 
military  occupational  specialties,  although  some  of  these 
specialties  are  not  readily  apparent  in  current  job 
description  (le.  medical  NCOU 

Twenty-five  cf  the  subjects  were  from  Fort  Ord  and 
included  a  variety  of  backgrounds  such  as  pilots,  air 


traffic  controllers,  signal  officers,  signal  non¬ 
commissioned  officers  (NCO's),  and  infantry  platoon 
sergeants.  Jive  of  the  subjects  were  data  processors;  2 
from  the  fleet  Numerical  Oceanographic  Center  in  Monterey 
ana  3  from  aatrinist rat i ve  offices  of  the  Naval  School. 
Twelve  subjects  were  students  at  NPS  and  enrolled  in  the 
Command,  Control,  ard  Communications  (C3)  curricula.  A  viae 
diversity  in  their  backgrounds  is  illustrated  by  previous 
job  categories  such  as  aviation,  communications,  systems 
programming,  communications  maintenance,  command  and  staff, 
am  nuclear  engineering. 

Twelve  cf  the  subjects  haa  experience  uslag  voice 
recognition  equipment,  having  participated  in  previous  voice 
experimentation  [Ref.  b]  .  A  summary  of  subject 
characteristics  is  provided  in  Table  III. 

C.  ICUIPMINT 

1 .  Voice  Recognition  System 

A  Threshold  Technology  Inc.,  Model  T-6Z0  voice 
recognition  system  was  used  to  represent  a  commercially 
available,  state-ol'-the  art  recognizer;  one  which  has  been 
well  documented  as  to  its  reliable  recognition  accuracy. 
The  T— €0i2  is  a  speaker  dependent,  isolated  word,  speech 
recognition  device  wnich  automatically  recognizes  spoken 
words  ana  phrases.  These  words  and  phrases  (utterances)  may 
be  as  brier  as  Z.l  second  out  will  usually  range  from  0.25 
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>lllM 


TABLE  III 


SUBJECT  CHARACTERISTICS 


SEX 

1 

1 

( 

1 

1 

SERVICE 

1 

1 

1 

1 

1 

LOCATION 

1 

( 

1 

1 

1 

VOICE 

Male : 

34  | 

I 

1 

Army:  2? 

1 

1 

1 

1 

I 

< 

1 

it  Ord : 

25 

1 

1 

1 

» 

1 

1 

Experienced 
Users:  12 

Female : 

10  ! 

i 

i 

Navy:  t 

i 

1 

1 

1 

NPS : 

16 

1 

1 

i 

t 

Naive 

i 

1 

Air 

i 

1 

r NCC  : 

2 

1 

1 

Users:  22 

1 

1 

\ 

1 

Force:  7 

J 

1 

1 

1 

LMDC : 

1 

1 

1 

1 

RAN  A 

0-4;  e 

C-3 :  b 
0-2 :  5 

CV3:  2 

C'*2:  3 

1-7:  t 

1-5 :  4 

E-5:  ? 

1-3 :  1 

CIV :  2 


OCCUPATIONAL  BACKGROUNDS 


Pilots:  2 


Air  Traffic  Ccntrcilers:  5 


Lata  Processors:  5  Supply  Officer;  2 

Medical  Officer:  1  Medical  MCO:  l 

Signal  Officer:  3  Signal  MCO:  3 

Finance  Officer:  1  Engineer  NCC:  1 

Operations  Officer:  1  Professor:  1 

Corrputer  Systems  Manager:  l 
Graduate  Students:  12  (vinca  include) 
Pilots:  2 

Communications  Officer:  2 

Ccmmunica tions  Maintenance  Officer:  2 

Systems  Programmer:  1 

WW^CCS  Programmer:  1 

Submarine  Nuclear  Engineer:  l 

Infantry  Unit  Commander:  1 

AUTODIN  Supervisor:  1 
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to  1.65  seconds  and  must  be  separated  by  very  short  pauses  or 
.1  second  or  more.  The  terminal  allows  a  user  to  begin  an 
utterance  before  it  has  completed  processing  the  previous 


one,  tut  in  this  experimentation  rate  of  speech  was 
controlled  by  use  of  the  HIAEY  indicator  light  located  on 
the  tape  cartridge  unit.  This  light  indicates  wnen  the 
terminal  is  ready  to  accept  the  next  utterance  in  Doth  the 
training  and  recognition  nodes  LRer.  225J . 

Tne  Threshold  6065  in  its  standard  configuration  is 
composed  of  the  following  four  elements: 

—  Terminal  consisting  of: 

analog  speech  preprocessor 
ISI-11  microcomputer 
digital  RS-iiSi'  input/ouput  interface 
—  Standard  CRT/Xeyboara  Display  Terminal 

Remote  Voice  Input  Unit  Microphone  preamplifier ) 

—  Tate  Cartridge  Unit 

The  terminal,  CRT  display,  microphone  preamplifier,  ana  tape 
cartridge  unit  were  table  mounted  (Figure  3)  within  an 
acoustic  soura  reduction  booth  (iigure  4).  A  conventional 
SRURE  moaei  SP-lfc)  ’’boom'  microphone,  supplied  as  standard 
equipment  with  the  T-600  was  used.  The  microphone  possesses 
a  special  noise  cancelling  design  which  allows  the  T-€00  to 
perform  accurately  despite  most  extraneous  background  noises 
^Figure  i ) . 
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lire  1).  Hdt.ement  ,1'  cue  SHUPL’  S f't  —  1  ej  Ml crc jiiione 


Toe  speech  preprocessor  accepts  the  speech  signal 
input  from  the  microphone  preampiif ier  and  passes  it  through 
a  spectral  analyzer  for  word  boundary  detection.  The 
feature  extractor  monitors  for  22  phoneticaiiy-relevant 
features,  ana  converts  these  to  aigital  signals.  Words  are 
detected  from  occurrences  of  row  energy.  A  minimum  pause  of 
6.1  second  rust  occur  to  prevent  confusion  between  words . 
Any  breathing  noise  at  the  ena  of  the  word  is  removed.  The 
remaining  speech  is  divided  into  lc  fixed,  time  segments,  ard 
features  are  reconstructed  onto  the  normalized  16  segment 
time  base. 

The  microcomputer  dees  a  comparison  of  input  signals 
agair.st  stored  reference  patterns.  Each  word  is  represented 
by  512:  (16  x  22)  tits  of  information .  The  closest  fit 
between  an  incoming  template  ena  the  alternative  stored 
training  template  is  found,  ana  that  'closest'  word  is 
declared  the  wera  identity,  unless  the  score  is  so  irw  that 
no  decision  can  be  made  and  the  utterance  is  rejected 
outright.  The  vocabulary  reference  patterns  are 
established  oy  tne  suDject  'training'  the  recognizer.  This 
is  accomplished  cy  the  subject  making  a  set  aumber  of 
repetitions  of  the  various  vocabulary  utterances. 

Cnee  a  match  is  found,  the  appropriate  cnaraeter( s) 
are  sent  via  the  output  interface  to  the  CRT  to  indicate  to 
the  user  which  utterance  was  recognized.  These  terminal 
matches  are  further  categorized  as  mlsreccgniticns ,  where 
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the  terminal's  'closest'  match  to  tne  reference  vocabulary 


has  not  precisely  tbe  sane  utterance  spoken,  or 
recognitions ,  in  v»hlch  tbe  utterance  spoken  is  exactly 
recognized  ana  so  reflected  in  tbe  CBT  output.  Rejection  of 
an  utterance  is  a  tnird  category  and  is  Indicated  by  an 
audible  'beep'. 

The  remote  voice  input  unit  allows  components  to  be 
remotely  located  up  to  2000  feet  from  tbe  terminal  processor 
ana  provides  tne  means  to  adjust  tbe  volume  (amplification) 
cf  tbe  amplifier  to  accommodate  the  normal  speaking  voice  of 
each  particular  subject. 

The  tape  cartridge  unit  is  a  digital  tape  recorder 
used  to  store  and  recall  application  data  ana  an  individual 
subject's  vocabulary  reference  patterns.  Once  tbe  data 
cartridge  is  recorded  it  contains  all  tbe  information 
necessary  tc  initialize  tbe  Threshold  600  terminal  for  each 
subject.  Tbe  T-60B  is  capable  cf  storing  a  2t6  word 
vocabulary  i*hicn  may  be  recorded  or  leaded  in  a  few  minutes 
using  tbe  tape  unit. 

2  .  Spi rometer 

A  recording  spirometer,  figure  6,  a  type  cf 
gasometer,  was  used  for  measuring  and  recording  vital 
capacity.  It  consists  of  a  metai  taDk  containing  a  movable 
piston  vita  a  water  seal,  air  input  line,  exhaust  valve  for 
resetting,  ink  stylus,  ana  revolving  cylinder  for  mounting 
chart  paper  calibrated  in  cubic  centimeters. 
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orfling  Spirometer 


As  the  subject  breathes  into  the  .rrouthpiece ,  Figure 
7,  air  replaces  water  in  the  inner  piston,  which  rises  by  an 
amount  proportional  to  the  exhaled  air.  The  subject,  once 
fitted  with  the  mouthpiece,  is  given  instructions  to  inhale 
to  the  greatest  extent  possible  and  then  exhale  all  the  air. 
This  procedure  was  repeated  three  times  and  the  average 
vital  capacity  usee  for  analysis  purposes. 

3.  Feck  Flow  .vster 

The  Wright  Pea*  How  Meter  was  used  to  measure  the 
r.aximur  air  flow  rate  in  a  single  forced  expiration.  The 
instrument,  figure  £,  consists  cf  a  pivoted  vane,  tne 
rotation  cf  which  is  opposed  by  resistance  cf  a  spring.  The 
plastic  mouthpiece  fits  into  the  radial  inlet  which  leaas  tc 
tne  vane.  attached  to  the  vane  is  a  spindle  and  pointer. 
The  forced  expiration  causes  the  vase  and  pointer  tc  rotate 
until  me  maxlmun  attainable  flew  has  been  reached.  Cnee 
reached,  the  pointer  is  neid  in  position  by  a  ratchet  until 
released  by  a  reset  tutton  cn  the  tack  of  the  device.  The 
scale  is  graduated  in  liters  per  minute  in  5  1  iters/minute 
divisions  over  a  range  or  ee  to  1 0KF  i i ters/m inu te. 

Frcceduraiiy,  the  subject  stands  and  holds  the  meter 
in  a  vertical  plane  as  depicted  in  Figure  S.  He/she  then 
takes  as  deep  a  breath  as  possible,  places  the  mouthpiece  in 
the  mouth,  grips  it  tightly  with  the  teeth,  and  seals  It 
with  his/her  lips.  The  subject  blows  cut  as  hard  as 
possible  in  a  short,  sharp  expulsion  cf  air.  This  procedure 
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was  performed  tnree  tlrr.es  with  the  average  aotea 
appropriate  peaJc  expiratory  flow. 


as 


the 


4.  Tape  Recorder 


An  AAAI  4k;ea  DS  hA-II  magnetic  tape  recorder  was 


used  for  the  recording,  storage,  ana  reproduction  of  speech 
sounds  (figure  10).  The  device  is  a  typical  analog  magnetic 


tape  recorder  consisting  of  three  basic  parts.  These 


include  the  electronics  or  the  system,  the  head  assembly, 
ana  the  tape  transport.  These  components  taxe  a  phenomenon, 


such  as  the  speech  souna,  that  changes  in  time  ana  reebras 
it  as  a  continuous  event. 


Figure  12.  AAAI  Tape  Recorder 
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Tapes  were  recorded  for  aii  44  subjects  during  their 
participation  in  the  experiment.  Subject  to  availability  of 
analytical  software  at  NFS,  further  acoustical  analysis 
coula  be  conducted  on  speaker  variability  that  might 
substantiate  and  support  statistical  conclusions. 

I.  INSTBuKENTATlON 

Three  questionnaires  were  used  to  elicit  the 
evaluations,  Judgement,  comparisons,  attitudes,  and 
bacKground  history  of  the  subjects  participating  in  the 
experimentation.  The  first  two  questionnaires  were  designed 
[Ref.  Z£]  to  provide  the  necessary  information  to  delineate 
subjects  into  various  groups  representing  those  human 
factors  discussed  earlier.  The  third  questionnaire  was  used 
to  measure  state  ana  trait  anxiety  levels  luring  various 
periods  of  the  experiment.  The  questionnaires  were 
"author-administered"  in  order  tc  provide  clarification,  if 
netced,  tc  any  written  instructions  and  insure  that  all 
respondents  completed  the  questionnaires  correctly,  giving 
appropriate  consideration  to  each  iter. 

Three  types  of  questionnaire  items  were  used;  open- 
enaed,  multiple  choice,  and  rating  scale.  The  open-ended 
items  permitted  the  subject  to  express  his/her  answer  tc  the 
question  in  one's  own  words.  In  all  cases,  these  questions 
required  short  (one  or  two  words)  objective  replies.  The 
multiple  choice  questions  allowed  each  respondent  to  choose 
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lae  appropriate  answer  from  a  list  of  several  options. 
These  multiple  choice  questions  include  "dichotomous"  items, 
for  eiairpie,  those  requiring  only  a  YES  or  NO  response, 
finally,  rating  scale  iters  were  used  to  obtain  judgements 
or  attitudes  about  some  object,  concept,  or  system..  These 
questions  permitted  the  assignment  of  various  response 
alternatives  along  an  unbroken  continuum  or  In  ordered 
categories  along  the  continuum.  Beth  a  grapnic  scale, 
allowing  the  respondent  to  place  his/her  judgement  any  place 
along  the  line,  and  a  numerical  scale,  confining  the 
suDjtct's  response  to  a  discrete  category  along  the 
continuum  were  employed. 

1 .  User  Questionnaire  #1 

User  Cuestionnaire  #1  (Appendix  A)  employs  a 
combination  of  question  items  including  open-ended,  multiple 
choice,  and  graphical  rating  scale  items.  Questions  1-21! 
are  designed  to  obtain  information  pertaining  to 
occupational,  personal  and  physiological  characteristics. 
Questions  23-4tf  obtain  attitudinal,  comparison,  and 
evaluation  infornatlon  pertaining  to  occupational, 
operational,  physiological  ana  psychological 
characteristics . 

2.  User  Questionnaire  #2 

User  Cuestionnaire  #2  (Appendix  B)  utilizes  a 
combination  of  question  items  including  multiple  choice  and 
graphical  rating  scale  items.  Questions  1-3  obtained 
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information  relative  to  pnysiological  factors  while 
questions  4-15  were  repetitious  items  from  user 
Questionnaire  #1  designed  to  obtain  attitudinal  information 
from  the  subjects  after  using  speech  recognition  equipment 
for  four  weexs. 

3.  STAI  Questionnaire 

The  State-Trait  Anxiety  Inventory  (STAI)  is 
comprised  of  separate  seif-report  scales  for  measuring  two 
distinct  anxiety  ccrcepts:  state  anxiety  (A-State)  and 
trait  anxiety  (A-Tralt).  This  inventory  was  developed  by 
Spieiberger  et.  ai.  at  Vanderbilt  University  and  later 
continued  at  Florida  State  University.  It  was  reproduced 
with  the  special  permission  cf  the  Puolisher,  Consulting 
Psychologists  Fress,  Inc.,  Palo  Alto,  Caiifcrnia. 

The  STAI  A-Trait  scale  consists  cf  20  statements 
(Appendix  C)  that  asx  people  how  they  generally  feel.  The 
A-State  scale  also  consists  cf  Z0  statements  (Appendix  E) 
but  the  instiuctions  require  subjects  to  indicate  bow  they 
feel  at  a  particular  moment  in  time.  The  STAI  was  designed 
to  be  self-administered  and  was  given  individually  to  each 
subject.  Complete  instructions  are  printed  or.  each  test 
form,  for  both  the  A-Trait  ana  A-State  scales.  There  were  no 
time  limits  imposed  for  completion  of  the  form.  Aithcugh 
many  of  the  iterrs  have  face  validity  as  measures  of  anxiety, 
the  inventory  was  referred  to  as  a  Self-Evaluation 
Questionnaire.  Each  subject  responds  to  every  STAI  item  by 
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circling  me  appropriate  number  to  tine  right  cf  each  item 
statement  on  the  form.  Scoring  Jteys  are  depicted  with  each 
scale  in  Appendices  C  and  E  l?ef.  2VJ . 

1.  IIPIRItf INTAL  DISIGN 

A  three-factor  niied  design  with  repeated  measures  on 
one  factor  was  employed  in  this  experiment.  In 
consideration  of  the  wide  variety  of  human  factors  to  he 
examined,  the  experiment  was  designed  to  allow  an  analysis 
cf  three  critical  factors  (occupational  experience  with 
microphones,  operational  training  method  and  experience) 
affecting  recognition  accuracy  while  simultaneously 
gathering  sufficient  data  to  accomplish  subsequent  analysis 
on  individual  characteristics  cf  speaker  variability.  The 
two  between  variables  were  microphone  experience  and 
training  method,  The  third  factor,  experience  (Weeic#),  was 
the  within  group  variable.  A  summary  of  the  experimental 
design  appears  in  figure  11. 

5.  PRC  CHUR  Z 

1.  Train! ng 

for  the  T-C00,  the  training  procedure  consists  of 
entering  10  passes  of  each  utterance  into  the  voice 
recognizer.  A  word  list  cf  100  utterances  (Appendix  I)  was 
provided  the  subject,  each  utterance  prompted  on  the  CRT, 
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Figure  11.  Experimental  Design 
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the  le  passes  spoken,  and  then  the  next  utterance  on  the 
list  would  be  prompted.  Based  on  the  experimental  design, 
subjects  were  divided  into  two  groups;  supervised  and  non- 
supervised.  Those  supervised  during  training  received 
detailed  instructions,  and  close  scrutiny  on  each  of  the  10 
passes  by  the  experiment  administrator .  If  the  subject 
failed  to  clearly  pronounce  the  utterance,  if  volume  level 
was  insufficient,  cf  if  the  required  .1  second  pause  was 
omittea,  the  word  was  immediately  retrained.  Non-super vi sed 
subjects  received  the  same  instructions,  a  short 
demonstration  cf  the  training  proceaure  ard,  when  ready, 
were  allowed  to  train  the  equipment  Individually  with  no 
supervision  Dy  the  experiment  administrator. 

Training  was  accomplished  only  during  the  first  week 
of  the  experiment.  Subjects  training  in  the  morning  (0720- 
122 e  hours)  would  subsequently  test  during  those  periods  and 
likewise  for  those  subjects  training  in  tne  afternoon 
(1400-I9f0  hours).  Immediately  after  training,  ail  subjects 
made  at  least  two  passes  cf  the  entire  100  word  vocabulary 
isiriiar  to  a  test  session)  to  identify  any  problems  in 
training  cf  a  particular  utterance.  If  the  utterance  was 
correctly  identified  on  both  passes  it  was  considered  as 
trained.  however,  if  an  error  (either  misrecognition  or 
non-recognition)  occurred,  a  third  pass  was  trade.  If  less 
than  two  cf  the  three  passes  of  any  utterance  was  correct, 
that  utterance  was  retrained. 
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After  the  equipment  was  trained,  each  subject  was 
measured  for  vital  capacity  and  peat  flow  rate,  finally. 
User  Questionnaire  #1  was  administered.  Total  time  for  the 
training  session  averaged  l.t  hours  per  subject. 

2.  Recognition  Testing 

following  training,  subjects  were  tested  on  the 
system.  Each  subject  iraae  2  passes  through  the  entire 
vocabulary  list  on  each  of  three  days  during  the  weet. 
Duration  of  the  experiment  was  three  weeks.  During  Week  #1 
the  vocabulary  list  remained  in  the  same  order  as  during 
training  (Appendix  E)  while  in  week  #2  the  order  of  the 
i iterances  were  reversed  (Appendix  f)  and  in  Weex  #3  the 
order  was  randomized  (Appendix  G).  The  purpose  of  this 
Lhange  in  vocabulary  crier  was  tc  reduce  the  effect  cf 
learning  due  to  repetitiveness,  and  thereby  provide  a  more 
realistic  picture  cf  speaker  variability.  Data  was 
collected  in  the  tern  of  recognitions,  misrecognitions,  and 
ucn-recognl t iocs  using  Appendix  H . 

Ihe  STAI  questionnaire  for  A-State  scale  measurement 
was  administered  just  pricr  to  the  first  testing  session 
(Wee*  #1,  Trials  1-2)  to  determine  anxiety  levels  prior  to 
using  voice  equipment.  During  Week  ft  2  another  STAI 
questionnaire  for  A-State  scale  was  administered  following 
tne  first  test  session  of  that  week.  The  final  STAI  form 
for  the  measurement  of  A-Trait  scales,  was  administered 
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during  ieek  #2.  User  Questionnaire  #2  was  provided  to  each 
subject  at  the  conclusion  of  the  experiment . 

3 .  Vocabulary 

It  was  desired  that  a  test  vocabulary  similar  tc  a 
vocabulary  intended  for  practical  application  in  a  military 
environment  be  used.  Of  concern  in  the  design  or  the 
vocabulary  was  the  fact  that  trief  monosyllabic  words  are 
more  difficult  to  recognize  that  longer  polysyllabic  words 
or  phrases.  A  relatively  equal  distribution  cf  words  and 
utterances  containing  a  syllabic  content  ranging  from  1  to 
^5  syllables  was  selected  as  the  final  vocabulary.  The 
words  were  chosen  tcth  from  previous  experimentation  (.Ref 
23]  and  the  author's  military  experience.  Appendix  I 
provides  a  listing  of  the  100  utterances  used  in  the 
experiment  and  considered  as  representative  of  use  in  a 
military  command  center. 

G.  VARIAEIaS 

The  dependent  variables  in  this  experiment  were  total 
errors,  a  linear  combination  of  misrecognitions  and  non- 
recognitions.  Independent  variables  in  the  overall 
experimental  design  are  experience,  job  function,  and 
training  method.  Additional  independent  variables  Included 
each  of  me  individual  human  iactor  characteristics  elicited 
earlier. 
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Data  was  collected  on  the  eleven  subjects  within  each 
group  or  the  experimental  design.  Each  subject  made  £00 
utterances  per  ween  for  a  grand  total  of  1600  for  the 
experiment.  Total  utterances  for  the  completed  experiment 
numcered  7t>,£0Z  (44  x  1600). 
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V.  ANALYSIS  AND  RESULTS 


A.  GENERAL 

All  analyses  were  performed  using  the  MINITAB 
statistical  package  [Ref.  28].  Repeated  reasures  analyses 
cf  variance  procedures  were  performed  in  accordance  with 
guidance  provided  by  Bruning  and  Kintz  [Ref.  2y]  .  Non- 
paran;etric  tests  for  significance  between  pairs  of  means, 
several  independent  samples,  and  for  trend  analysis  were 
conducted  utilizing  procedures  discussed  by  Conover  [.Ref. 
30],  Additional  parametric  analysis  followed  procedures 
prescribed  by  Ctt  iRef.  31J  . 

Ail  mean  error  rates  that  apptar  in  figures  are  of 
untransformea  data.  Since  the  i  test  in  an  analysis  of 
variance  is  vAlid  even  with  mild  departures  from  the 
assumption  of  equality  of  variances  IRef.  31:  p.  63k:], 
Hartley's  Test  for  homogeneity  of  population  variances  was 
used  tc  determine  whether  an  extreme  case  (unequal 
variances]  existed  and  thereby  determine  if  a  transformation 
cr  aata  would  be  required  to  stabilize  the  variances. 
Results  cf  this  test  are  presented  in  Table  IV.  The 
assumption  cf  equal  variances  is  the  basis  for  the  use  of 
un transformed  data  In  ail  subsequent  analyses. 

The  correlation  coefficient  reported  herein  is 


Spearman's  Rhc .  Although  the  Pearson  Product  Moment 


TABU  IV 

TEST  FOR  EQUALITY  OF  VARIANCES 


EATA:  2 

s  (group  I)  =  1947.42 

*- 
<L 

s  (group  ii)  *  seee.sa 

2 

s  (group  III)  =  2e2b.Bc 

2 

s  (group  IV )  =  5626.95 

HYPOTHESES: 


ti0 :  All  population  variances  are  equal 

H,  :  Not  all  population  variances  are  tbe  same 


TEST  STATISTIC: 


Mai 


2 

j 

Mai 

2 

j 

Min 


=  2.895 


EECISICN : 

Level  of  significance:  .05 

Tabulated  value  of  F  =  5.67 

Mai 

CANNOT  REaECT  TEE  NULL  HYPOTHESIS 


correlation  coefficient  'r'  is  most  commonly  reported,  it  is 
However,  a  random  variable,  and  as  such  Has  a  distribution 
function.  Conover  (Ref.  30]  states  that  'r'  has  no  value  as 
a  test  statistic  in  nonparemetric  tests  unless  the 
distribution  is  known. 
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£.  OCCUPATIONAL  CHARACTERISTICS 


1.  Hypotheses 

The  following  hypotheses  pertaining  to  the 
occupational  characteristics  of  speakers  using  voice 
recognition  equipment  were  tested: 


a.  H0 :  Job  function  (microphone  experienced  users 
versus  non-mlcrophcDe  experienced  users) 
will  have  no  affect  on  recognition 

accuracy . 

H,  :  Job  function  (microphone  experience) 

affects  recognition  accuracy. 


c.  H0  :  The  branch  of  service  the  irilitary  member 
belongs  to  will  have  no  affect  on 
recognition  accuracy. 

h,  :  Recognition  accuracy  is  influenced  by  the 
branch  cf  service  of  the  user. 


c.  H0  :  A  user's  attitude  pertaining  to  his/her 
present  job  satisfaction  will  nave  no 
affect  on  recognition  accuracy. 

H,  :  job  satisfaction  affects  recognition 

accuracy . 


fc0:  The  degree  cf  satisfaction  a  user  derives 
from  being  a  member  of  the  military  wiil 
cot  affect  recognition  accuracy. 

L,  :  Service  satisfaction  has  an  affect  on 

recognition  accuracy. 


i0:  The  amount  of  previous  computer  experience 
a  user  has  had  will  not  affect  recognition 
accuracy . 

[,  :  Previous  computer  experience  affects 

recognition  accuracy. 


f.  H  :  Competency  in  a  foreign  language  ( fc i  —  or 
°  multilingual)  will  have  no  affect  on 
recognition  accuracy. 

H.  :  Competency  in  a  foreign  language  will 

affect  recognition  accuracy. 

2.  Job  Function 

Tne  results  of  tne  experiment  for  users  witn  ana 
Aitbout  microphone  experience  are  shown  graphically  in 
figure  12.  Microphone  experienced  users  fared  only  slightly 
better  tbaD  non-microphone  experience!  users.  The  analysis 
cf  variance  ;ANCVA}  results  in  Table  V  sunstantlate  this 
showing  an  ?  ratio  of  .277  inaicating  no  statistically 
significant  difference  in  the  user's  job  function.  Thus, 
the  null  hypotnesis  cannot  be  rejected. 
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Figure  12.  Mean  Error  Rate  vs.  Job  Function 


TABU  V 


ANALYSIS  Cl  VARIANCE  iOfi  RECOGNITION  ACCURACY 


SOURCE 
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Mean  total  error  rates  for  microphone  ano  non- 
microphone  experienced  users  is  summarized  in  Table  VI.  The 
definitive  decrease  in  error  rates  by  time  will  be  discussed 
later  in  the  review  of  operational  characteristics. 


TABLE  VI. 

MEAN  TCTAI  ERROR  RATES  ICR  JOB  FUNCTION  BY  MEEKS 

(in  Percent) 
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3. 

Branch 

of  Service 

Three 

branches  of 

service  were  repre 

sented  in 

the 

experiment 

with  civilian 

subjects  categori 

zed 

as  a  fourth 

branch 

.  A 

KrusKai-Vai iis 

test 

for  It  >  Z  samp 

les 

was  used 

to 

determine 

if 

any  differences 

existed.  Tab 

ie 

VII  provides 

the  synopsis  of  results.  The  Dull  hypothesis,  that  branch 


of  service  viil  cot  affect  recognition  accuracy,  is  clearly 
rejected.  Multiple  comparisons  were  made  to  determine 
between  which  pairs  of  means  the  differences  occurred.  The 
results  of  this  test  indicated  significant  differences 
between  Army/Navy  and  Army/Ai  r-i'orce .  Differences  between 
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Civilian/Army ,  Clviiian/Air-Force ,  Civiilan/Navy  and 
Navy/Alr-I orce  were  not  significant. 

Further  inspection  of  these  results  indicated 
possible  confounding  due  to  experience  with  voice 
recognition  equipment.  All  Air  Force  personnel  and  3  out  of 
E  Navy  personnel  were  experienced  users.  Segregating  the 
experienced  and  naive  users  into  separate  categories  and 
taen  reconaucting  the  analysts  fcr  affect  by  branch  of 
service  showed  nc  statistical  significance  (Table  VII). 
Using  the  original  hypotheses  established,  the  null  cannot 
t?e  rejected  in  either  the  naive  only  or  experienced  only 
cases.  Pean  error  rates  by  branch  of  service  for  all,  naive 
only  and  experienced  only  subjects,  are  presented 
graphically  in  Figure  13. 
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1 

1 

ALL  SUBJECTS 

i 

NAIVE 

—  + 

1 

1 

EXPERIENCED 

Type  or 
Test 

1 

1 

1 

1 

Kruskal- 

Wallis 

i 

i 

i 

i 

Kruskal 

Wallis 

_  1 

i 

i 

t 

Krusxal- 

Wailis 

Alpha 

1 

( 

.a5 

i 

i 

.at 

i 

i 

.as 

Test 

Statistic 

i 

1 

1 

1 

11.9a  ** 

\ 

i 

i 

2.79 

i 

i 

i 

.23 

Critical 

Level 

1 

1 

1 

1 

.aa75 

< 

i 

i 

t 

.25 

i 

i 

i 

i 

.9a 

**  =  Significant  at  stated 

level 

of 

significance 

68 


i 


MEAN  % 

ERROR 

RATE 


e.0 

7.0 
6  .0 
5.0 
4.0 
3  .0 
2.0 
1.0 


Civilian  Anry  Navy  Air  Force 

Figure  12.  Mean  Error  Rate  vs.  Eranch  of  Service 

4.  Job  and  Service  Satisfaction 

Subjects  were  divided  intc  four  groups  based  upon 
tneir  subjective  responses  ana  included: 

a.  Persons  who  disliked  tneir  jobs 
o.  These  who  were  borderline  or  neutral  in  their 
feelings 

c.  Individuals  trno  lined  their  present  Job 

d.  Persons  vno  indicated  a  very  definite  lining  of 
tneir  job  —  liked  tneir  job  very  rtuch 

The  attained  test  statistic  (Table  VIII)  leads  to  the 
decision  that  tne  null  hypothesis  cannot  be  rejected.  The 
correlation  coefficient  between  the  two  variables  was  not 
significant  and  it  is  concluded  that  there  is  no  apparent 
correlation  between  the  satisfaction  a  user  has  for  his/her 
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AFFECT  BY  JOF/SIHVICE  SATISFACTION 


j 

1 

-4- 

i 

1 

JOB  SATISFACTION 

-4- 

1 

1 

SERVICE  SATISFACTION 

—  4 

1 

1 

.Type  of  Test 

1 

1 

Xruskai-fcaiiis 

t 

t 

Kruskal-Viailis 

1 

t 

i  Alpha 

i 

» 

1 

.idi 

1 

1 

.0b 

1 

1 

!  Test 

;  Statistic 

1 

1 

i 

1 

4.60 

i 

» 

i 

i 

.2iy 

1 

i 

1 

1 

4 

i  Critical 

I  Level 

1 

\ 

i 

1 

.20 

l 

i 

i 

1 

.y0 

1 

i 

1 

1 

!  Correlation 
'  Coefficient 

i 

1 

1 

4 

» 

.016 

1 

i 

i 

i 

.041 

1 

1 

l 

1 

1 

I  **  -  Significant  at  stated 

level  cf  s ign ificance 

1 

i 

—  4 

current  jet  anc  how  hell  that  user  will  perform  with  voice 
recognition  equipment.  This  particular  human  factor  is 
nevertheless  worthy  of  further  exarriaation  in  the  future  in 
terms  of  users  whose  current  jot  entails  the  day  tc  day  use 
of  voice  equipment . 

Iu  the  analysis  cx  the  affect  service  satisfaction 
nas  cn  recognition  accuracy,  the  2  civilians  were  removed 
from  the  sample  population.  Subjects  were  now  divided  into 
three  groups  cased  upon  their  subjective  responses  and 
included: 

a.  These  who  are  unsatisfied  or  don't  care 

b.  Those  who  are  reasonably  satisfied 

c.  Those  who  are  very  satisfied  with  their 
respective  service 


y0 


The  test  statistic  (Table  VIII)  reveals  no  significant 
cifference  between  groups  ana  therefore  the  nuii  hypothesis, 
that  the  degree  of  satisfaction  a  speaxer  derives  from  being 


in  the  armed  services  will  not  affect  recognition  accuracy, 
cannot  be  rejected.  Correlation  between  service 
satisfaction  and  total  error  rates,  as  before,  was  not 
significant,  thus  indicating  little  or  nc  correlation 
between  the  randon  variables. 

t .  Previous  Computer  Experience 

Subjects  were  subjectively  divided  into  four  groups 
Dased  upon  their  response  tc  question  ft 32  in  User 
Questionnaire  ft  1  and  included  persons  with: 
a.  Nc  eiperier.ee 
d.  Very  little  experience 

c.  Sore  or  roaerate  experience 

d.  Considerable  experience  (data  processors) 

The  analysis  previaed  a  test  statistic  (Table  IX)  which 
resulted  In  the  rejection  of  the  null  nypothesis  and  the 
conclusion  that  previous  corputer  experience  will  affect 
recognition  accuracy.  Multiple  corpsrlsons  were  perforreo 
to  determine  which  pairs  of  means  differed.  Significant 
differences  occurred  between  users  with,  no  and  considerable 
experience,  very  little  and  moderate  experience,  and  very 
little  and  considerable  experience.  These  results 
demonstrate  that  possession  of  experience  with  da ta/iteybcard 
Input  procedures  provide  a  higher  recognition  accuracy. 


Explanation  for  tills  occurrence  may  be  attributed  to,  for 
example,  a  data  processor's  awareness  of  the  time  involved 
for  manual  entry  and  the  associated  error  rate  as  well.  The 
advantages  that  voice  input  offers  to  those  computer 
experienced  personnel  may  well  be  a  psychological  or 
motivational  factor  in  addition  to  its  presence  as  an 
occupational  characteristic. 

These  results  are  further  substantiated  by  the 
computed  correlation  coefficient.  Performing  a  one-tail 
test  for  negative  correlation  with  the  existence  of  mutual 
independence  as  the  nuii  hypothesis,  we  were  able  to  reject 
this  hypothesis  and  conclude  that  as  computer  experience 
increases,  recognition  error  rates  will  decrease  (Critical 
Level:  <<.  .001;.  Graphical  representation  of  mean  error 
rates  for  the  tour  groups  are  shown  in  Figure  14. 


TABLE  IX 

AFFECT  CF  COMPUTER  EXPERIENCE 


!  COMPUTER  EXPERIENCE 

Type  of  Test 

1 

1 

ArusKal-Walils 

A1  pha 

t 

1 

0.05 

Test  Statistic 

1 

t 

14.287  ** 

Critical  Level 

1 

1 

<  .0es 

Correlation 

Coefficient 

1 

» 

1 

1 

-.516  ** 

**  =  Significant  at  stated  level  of  significance 
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Figure  14.  Mean  Error  Bate  vs.  Computer  Experience 

6.  foreign  Language  Corrpetency 

Recognition  accuracy  was  compared  between  two 
groups,  these  with  a  fluent  proficiency  in  a  foreign 
language  ana  those  without.  32  suDjects  possessea  no 
capability  in  a  secena  language,  whereas  11  were  competent 
in  one  or  more  languages.  The  meaict  total  error  rate  for 
both  groups  was  6.28%.  A  two-sample  non-parametric  test, 
the  hann-ifcitney ,  was  perfermea  tc  detect  the  existence  of 
any  aiffereaces  between  the  two  groups.  The  computea  test 
statistic  (laoie  I)  clearly  shows  no  significance  at  the  .05 
level  ana  therefore,  the  null  hypothesis  cannot  b<_:  rejected. 
The  critical  regions  for  this  twe-taii  test  incluaea  values 
of  the  test  statistic  less  than  672  or  greater  than  814.8. 
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TABLE  1 


AFFECT  OF  COMPETENCY  IN  ANOTHER  LANGUAGE 


j  I  FOREIGN  LANGUAGE  \ 

-L  _  l 

1 

1 

Type  of  Test  ! 

Mann-Whitney 

1 

1 

1 

l 

1 

Alpha  i 

e.0  5 

1 

1 

f 

1 

♦ 

Test  Statistic  1 

764.5 

1 

\ 

I 

» 

Critical  Level  ! 

.3776 

1 

l 

1 

+  -  - 

**  =  Significant  at 

stated  level  of  significance 

i 
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C.  OPERATIONAL  CHARACTERISTICS 
1.  Hypotheses 

The  following  hypotheses  apply  to  the  operational 
characteristics  unaer  which  the  subjects  were  tested. 


a.  H0 :  The  method  of  training  a  user  for  voice 
recognition  operation  (supervised  versus 
non-supervised )  will  not  affect  recognition 
accuracy . 

R, :  Method  of  training  will  arrect  recognition 
accuracy 


o.  R0 :  The  time  of  day  in  which  a  user  trains  the 
equipment  will  Dot  affect  recognition 
accuracy. 

H,  :  Recognition  accuracy  of  the  user  will  be 
affected  hy  tne  time  of  day  in  which  he/she 
trains  the  voice  recognizer. 


c.  H0:  The  period  of  the  week  In  which  the  user 
trains  the  equipment  will  not  affect 
recognition  accuracy. 

H, :  The  period  of  the  week  in  which  the 

equipment  is  trained  will  affect 
recognition  accuracy. 


d.  He :  Experienced  users  win  acquire  the  sare  or 
greater  error  rates  than  inexperienced 
(naive)  users. 

h,:  Experienced  users  wiii  ha7e  lcwer  error 
rates  than  naive  users. 


H0:  Eecognition  accuracy  will  not  te  affected 
by  veexly  experience. 

H, :  A  user  wiii  demonstrate  reduced  error  rates 
(decreasing  trend)  as  experienced  will 
voice  recognition  equipment  increases. 


e.  hc:  The  operational  ease  with  which  voice 

recognition  equipment  may  be  used  wiii  have 
no  affect  on  recognition  accuracy. 

H  (  :  Ease  oi  use  wiii  affect  recognition 

accuracy  . 


U .  Method  of  Training 

The  results  of  the  experiment  for  users  receiving 
either  supervised  or  non-supervised  training  are  depicted 
graphically  in  Figure  15.  Users  wno  received  supervision  in 
the  trairing  mode  fared  significantly  better  than  those  who 
lid  not.  The  analysis  of  variance  tatle  (ANOVA)  in  Tafcie  V 
substantiate  this  claim,  providing  an  i  ratio  of  4.668  ana  a 
critical  level  of  approximately  .225.  Thus,  the  null 
hypothesis  is  rejected  ana  we  may  conclude  that  the  method 
of  training  does  affect  recognition  accuracy.  Mean  total 
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error  rates  for  supervised  and  non-supervlsed  users  are 
summarized  in  Table  II. 
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Figure  15.  rean  Error  Rate  *s.  Training  Method 


TABLE  XI. 

MAN  TOTAL  ERROR  RATES  ECR  NSTHCD  CJ  TRAINING  BY  'HEEKS 

(in  Percent) 
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* 

1 

6.00 

! 
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FUNCTION 

i 

i 

i 

i 

£.22 

i 

i 

7.41 

i 

i 

j 

6.22 

Subjects  were  blocked  by  time  of  day!  morning  and 
afternoon,  and  by  time  of  week;  early  (Monday-Tuesday) ,  mid 
(Wednesday-Thursday )  or  late  ( Srlday-Saturday ) .  A  Mann- 
wbitney  test  was  performed  to  determine  If  differences 
existed  between  tbe  two  time  of  day  groups.  Morning  users 
nad  a  median  error  rate  of  t. l %  while  afternoon  users  bad  a 
6.e?%  error  rate.  Eecause  of  equal  sample  sizes,  a 
parametric  t-test  was  performed  to  confirm  results  of  tbe 
non-parametric  test.  Tbe  presented  In  Table  III  will  not 
aiicw  us  to  reject  the  null  hypothesis.  Critical  regions 
for  tbe  Mann-Whitney  test  included  values  of  tbe  test 
statistic  less  than  411. 5  and  greater  than  576. E. 

With  three  groups  in  the  time  of  week  variable,  the 
analysis  utilized  the  Kruskai-Wallls  test  for  determination 
of  differences  among  the  groups.  The  null  hypothesis  cannot 
ce  rejected  with  a  test  statistic  less  than  5.S9,  for  the 
Chi-square  value  with  two  degrees  of  freedom.  The 
correlation  coefficient  was  found  tc  be  significant  at  the 
0.0b  level  in  c  test  for  negative  correlation.  A  premature 
conclusion  tnat  training  occurring  in  the  latter  portion  of 
the  wees  would  yield  lower  error  rates  appeared  to  be 
counter-intuitive.  It  was  thought  that  fatigue,  and 
interruption  of  a  weekend  would  result  in  poorer  training 
efforts  and  hence  lead  to  higher  error  rates  in  the  future. 
Open  further  analysis,  this  reversed  correlation  was  found 


to  be  the  result  of  possible  confounding  arising  from  the 
large  nuirber  of  experienced  users  who  trained  In  the  later 
period  of  the  tree*.  Eight  out  of  thirteen  late  weex  users 
were  experienced  and  with  their  removal  from  consideration, 
the  correlation  between  time  of  weex  and  total  error  rate 
became  statistically  non-significant. 


TABLE  XII 

AFFECT  OF  TIM  OF  DAY  AND  WEEK 
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•4.  User  Exi 

:er  ience 

Two  sets  of  hypotheses  Id  Section  V.c.l.d  are 
incorporated  into  this  phase  of  the  analysis.  The  analysis 
of  the  first  set  was  performed  using  the  Nann-Whitney  test 
and  the  associated  results  are  summarized  in  Table  XIII. 
The  median  error  rates  for  naive  users  was  7.26%  while 
experienced  users  attained  a  2.75%  error  rate.  Both  groups 


baa  equal  numbers  of  supervised  and  unsupervlsed  users.  The 
correlation  coefficient  yielded  one  of  the  strongest 
correlations  between  two  variables  within  the  experiment, 
lhe  null  hypothesis  can  be  rejected  and  It  is  therefore 
concluded  that  experience  will  affect  recognition  accuracy. 


TABLE  XIII 

AFFECT  CUE  TC  USEE  EXPERIENCE 
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The  analysis  of  the  second  hypothesis  of  V.C.l.d  is 
depicted  graphically  in  Eigure  16,  (Trials  oy  Job  Function) 
and  Figure  1?  (Trials  by  Training  Nethcd).  In  each  case  no 
interaction  is  present,  with  the  weexly  error  rate  showing  a 
steady  drop  of  approximately  .8  to  1.4%  each  week.  This 
graphical  interpretation  is  proven  statistically  In  the 
ANOVA  presented  in  Table  V.  That  is,  the  F  ratio  is  well 
above  the  3.11  required  for  a  level  of  significance  of  0.05. 
The  null  hypothesis  is  rejectee  and  it  Is  concluded  that 
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users  will  improve  (reduce)  their  error  rates  through  weekly 
iteration.  This  conclusion  was  further  verified  oy 
application  of  the  Cox  and  Stuart  Test  tor  Trend.  The 
following  comparisons  were  made  cetween: 
a.  Week  #1  and  Week  #2 
o.  Week  *c  and  Wee*  #3 

c.  Week  #1  and  Week  #2 

In  ail  three  cases,  the  null  hypothesis,  that  there  is  no 
downward  trend,  was  clearly  rejected. 

5.  £ase  of  Use 

£asea  on  subjective  responses  by  those  participating 
in  the  experiment  four  groups  were  categorized.  They 
Include : 

a.  Users  who  consider  voice  recognition  equipment 
difficult  tc  use. 

d.  These  who  had  no  opinion  either  way. 

c.  Users  who  stated  tnat  voice  equipment  is  easy  tc 
use . 

d.  These  who  feel  that  voice  recognition  equipment 
is  7ery  easy  to  use. 

The  results  of  this  analysis  are  summarized  in  Table  IIV. 
The  test  statistic  is  less  than  the  Chi-square  value  of 
9.4ifc  with  three  degrees  of  freedom  and  therefore  the  null 
cannot  be  rejected.  The  computed  correlation  coefficient  is 
not  significant  at  the  0.4)£  level. 


TABLE  1IV 


AfEECT  DUE  TO  EASE  Of  USE  Of  VOICE  EQUIPMENT 
+■ - - - - - + 


EASE  OF  USE 


Type  of  Test 

!  Krusicai-Wallis 

Alpha 

!  2.05 

! 

Test  Statistic 

i  4.814 

i 

Critical  Level 

J  v  vje 

i  ✓  •tj+j 

Correlation 

Coefficient 

\ 

!  . it ? 

**  -  Significant 

at  stated  level  cf  significance 

+ 


E.  PERSONAL  CHARACTERISTICS 
1.  hypotheses 

The  follow lng  hypotheses  were  tested  pertaining  to 
the  personal  characterist ics  cf  voice  recognition  users: 


a . 


B, 


b .  h, 


Race  of  the  user  will 
recognition  accuracy. 


not  affect 


A  difference  in  recognition  accuracy  exists 
between  users  of  different  race. 


The  marital  status  of  the  user  will  not 
affect  recognition  accuracy. 


h .  :  A  user's  marital  status  will  have  an  affect 
on  his/ner  recognition  accuracy. 


Size  of  a  user's  fami 
recognition  accuracy. 

iy 

will 

a.  : 

family  size  will 
recognition  accuracy. 

have 

an 

on 


Idc 


c.  H- :  The  religious  pref  erence/bacirground  of  a 
user  will  nave  nc  affect  on  his/her 
recognition  accuracy. 

fl4  :  A  user's  religious  preference/ bacKgrouni 
will  affect  recognition  accuracy. 


a.  M0 :  A  person's  accent  will  not  affect  nis/her 
recognition  accuracy. 

H, :  Accent  affects  recognition  accuracy. 


e.  H«:  The  place  of  birth  of  a  user  will  nave  no 
affect  on  recognition  accuracy. 

E,  :  Or.e's  place  of  birth  affects  recognition 
accuracy. 


h0:  Tne  geographic  origin  cf  a  person  will  not 
affect  his  or  her  recognition  accuracy. 

:  A  person's  recognition  accuracy  will  be 
affected  by  geograpnic  origin. 


f.  Ea :  The  level  cf  education  an  individual  has 
attained  will  not  affect  nis/her 
recognition  accuracy. 

H,  :  Education  level  of  a  user  affects 

recognition  accuracy. 


g.  Hc:  The  Soc  io-econorric  class  of  a  user  will  not 
affect  recognition  accuracy. 

H,  ;  A  user's  recognition  accuracy  will  be 
affected  by  socio-economic  class  standing. 


h.  H0 :  Past  oral-surgery  or  orthodontal  care  will 
not  afreet  recognition  accuracy  of  the 
user . 

H,  :  Recognition  accuracy  of  tne  user  will  be 
affected  if  he  or  she  has  undergone  oral 
surgery  or  orthodontal  care. 
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2.  Race 


Twc  racial  backgrounds  were  represented  in  the 
sarrpled  population.  Thirty-eight  Caucasian  and  six  Negro 
subjects  participated  in  the  experimental icn .  The  median 
total  error  rate  for  Caucasian  personnel  was  6 %  end  6.6 %  for 
Negro  users,  A  mann-Whi tney  test  was  performed  tc  detect 
tne  presence  of  any  difference  between  the  two  groups.  The 
calculated  test  statistic  (Table  XV)  was  net  significant  at 
the  .05  level  and  the  null  hypothesis  cannot  be  rejected. 
Critical  regions  for  the  test  statistic  in  this  two-tail 
test  were  values  less  than  ?y7  and  greater  than  912. 


TABLE  IV 

Ari'ECT  Of  RACE  ON  RECOGNITION  ACCURACY 
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I  **  =  Significant  at  stated  level  of  significance  ! 
+ - - - — - - — - - - - + 


(5.  Mailtal  Status  and  Family  Size 

The  sample  population  consisted  of  14  single,  25 
married,  2  divorced,  ana  2  other  (separated,  widowed) 
personnel.  A  Kruskai-Waills  test  for  k  >  2  samples  was  used 
to  determine  if  any  differences  in  means  existed  between  the 
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groups.  lecause  the  computed  test  statistic  (Table  IVI)  is 
less  than  7. £15,  the  tabulated  chi-square  value  with  3 
degrees  of  freedom,  the  null  hypothesis  cannot  be  rejected. 
No  correlation  coefficient  was  computed  for  marital  status 
due  to  the  nominal  scale  of  measurement . 


TABLE  IVI 

Ail ECT  Of  MAE I1AL  STATUS  ANE  A AMI IY  SIZE 
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The  sample 

population  subdivided  into  five 

groups 

for  family 

size 

with  a  range 

from 

no  children  to  subjects 

having  four  or  more  children.  A  Kruskal-Vailis  test  vas 


again  used  tc  determine  if  a  difference  existed  and  as 
before,  the  null  hypothesis  cannot  be  rejected.  The 
computed  correlation  coefficient  indicates  mutual 
independence  between  family  size  and  total  error  rate  of  a 
voice  recognition  user. 
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4.  Religious  Preference. 


Although  a  diverse  variety  of  religious  preferences 
were  enumerated  by  participating  subjects,  sere  were  pooled 
to  preclude  nurercus  samples  sizes  of  Just  one  person.  For 
eiampie,  Kethedist  and  episcopalian  were  combined  into  the 
Protestant  category  and  so  forth.  In  all,  six  groups  were 
represented  and  included  Catholic,  Protestant,  Jewish, 
Baptist,  No  Preference  and  Others  (these  -bo  could  not  be 
readily  grouped  into  one  cf  the  afcremi  iooed  categories). 
Using  the  K rusJra i-Wa  1  lis  test  to  che  for  differences 
between  means,  the  obtained  test  stalls  Table  XVII)  dees 
not  allow  for  the  rejection  of  the  null  hypothesis. 
Therefore,  it  nay  ee  concluded  that  the  religious  preference 
of  the  user  will  not  affect  his/her  recognition  accuracy. 


TABLE  XVII 

AfilCT  OF  3IIIGI0US  FPIi'EPENCS 


!  i  RELIGIOUS  PREFERENCE  ! 

i  i  .  i 

!  Type  of  Test 

| 

Kruska 1-Wa Ills 

l 

1 

!  Alpha 

1 

i 

0.05 

1 

i 

1  Test  Statistic 

1 

1 

3.25 

f 

1 

!  Critical  level 

1 

1 

>  .25 

1 

1 

!  **  =  Significant 

at 

stated  level  cf  significance 

1 

; 
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Accent 


Ten  subjects  possessed  sotre  type  of  noticeable 
accent,  as  determined  by  the  subject  and  experiment 
admin  is tratnr .  Seven  were  Southern  and  three  were 
categorized  as  Other  (Spanish,  Bostonian).  Remaining 
subjects  were  placed  in  a  'No  Accent'  group.  The  resultant 
test  statistic  (Table  XVIII)  was  slightly  less  tha^  the 
tabulated  Chi-square  value  cf  5.991  with  two  degrees  cf 
freedom-  As  such,  the  null  hypothesis  cannot  be  rejected. 
An  additional  check  was  accomplished  by  combining  the  twc 
accent  groups  into  one  generic  entity  am  performing  a 
Nann-wnitcey  test  tc  detect  a  difference  between  the  tv.c 
groups.  Again  the  null  hypothesis  cannct  be  rejected  at  the 
stated  level  of  significance.  Correlation  analysis  was  not 


performed  due 

to 

the  nominal  scale  cf 

mea suremen  t . 

TAELE  XVIII 

AIP3 

CT 

OF  ACCENT  ON  RECOGNITION  ACCURACY 

* 

i 

1 

1  , _ _  _ 

1 

J 

1 

l 

ACCENT  ! 

(3  groups)  ! 

ACCENT 
( 7  groups) 

\ 

1 

i 

iType  of  Test 

1 

1 

1 

Kruskai-Wallis  ! 

Pann-Whi  tney 

1 

1 

!  Alpha 

1 

1 

1 

.05  ! 

.05 

1 

1 

1 

1  Test 

!  Statistic 

< 

t 

i 

1 

f 

1 

1 

G  nq  J 

w  •  1  w  t 

704 

1 

t 

1 

1 

i 

1 

!  Critical 

!  Level 

1 

1 

i 

i 

1 

.055  ! 

.09 

1 

• 

1 

1 

1 

!  **  =  Significant  at  stated  level 

cf  significance 

1 

i 
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Although  the  null  is  not  rejected,  the  critical  level  is 
sufficiently  close  to  the  stated  level  of  significance. 
Thus,  irean  error  rates  are  Illustrated  In  Figure  IS  for 
further  examination . 
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Figure  18.  Mean  Irrcr  Rate  vs.  Accent 


Subjects  were  esxea  to  provide  their  state  of  birth 
and  their  responses  were  subsequently  classified  into  one  of 
the  following  six  generic  groups: 
a.  Overseas 


o.  Northeast  United  States 


c.  Southeast  United  States 

d.  Mid-Central  United  States 

e.  Southwest  United  States 

f.  Western  United  States 

Applying  the  Kruskal-Waliis  test  to  the  con-plied  data,  the 
obtained  test  statistic  (Table  XIX)  is  insufficient  tr 
reject  the  stated  null  hypothesis. 

Because  a  person's  place  of  birth  is  not  necessarily 
the  environment  in  which  that  individual  grew  tip  in  (ie. 
during  ages  2-18),  data  pertaining  to  geographic  origin  was 
also  tested  to  determine  if  any  negative  affect  would  be 
encountered.  The  geographic  areas  used  were  the  same  as 
place  of  birth.  Calculated  results  print  to  The  same 
conclusion;  the  rull  hypothesis  of  Section  V.D.l.e.  canroT 
ce  rejected. 

TABLE  XII 


AEFECT 

CF 

PLACE  Of  BIRTH  AMD  G 

FCGRAPHIC  CHIGIN 

1 

1 

PIACE  of  BIRTH  ! 

GEOGRAPHIC  ORIGIN 

1 

Type  of  Test 

1 

t 

KrusKai-Wallis  ! 

Krusna 1-Well  is 

1 

1 

Alpha 

1 

.35  ! 

.25 

1 

i 

Test 

1 

1 

1 

1 

1 

1 

1 

Statistic 

1 

1 

5.32  ! 

4.09 

1 

i 

Critical 

1 

l 

1 

l 

1 

I 

Level 

( 

✓  •  C  w  1 

>  .25 

1 

1 

1  1  '  ~  l 

**  -  Significant  at  stated  level 

of  significance 

1 

I 

+  ■ 
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7.  Level  of  Education 


The  sampled  population  partitioned  into  the 
following  five  categories: 

a.  High  School  graduates. 

b.  Individuals  with  1  to  4  years  cf  college  tut  no 
degree . 

c.  College  graduates. 

d.  Individuals  wording  toward  a  graduate  degree. 

e.  Persons  accorded  a  graduate  degree  such  as  a 
Masters  or  Loctcrate. 

The  data  obtained  frorr  the  five  groups  was  tested 
for  any  sigrlf leant  difference  tetweer  groups.  The  test 
statistic  (Table  XX^  leads  to  the  rejection  cf  the  null 
hypothesis  ar.d  the  conclusion  that  lev*]  of  education 
affects  the  overall  error  rate  for  voice  recognition  users. 
A  relatively  strong  positive  correlation  exists  with  a 
critical  level  of  B.00f.  That  Is,  as  the  individual 
increased  in  level  of  education,  a  ccncorltant  decrease  in 
error  rate  occurred. 

Multiple  comparisons  between  the  various  groups 
showed  the  predominant  Influence  tc  be  graduate  students, 
further  examination  Indicated  possiDie  confounding  due  tc 
that  group's  prior  experience  with  voice  recognition 
equipment.  Eleven  cut  twelve  graduate  students  were 
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TABLE  XI 

AFFECT  01  IEVEI  OF  EDUCATION 


Type  of  Test 
Alpha 


EDUCATION  (ALL) 


Krusna 1-Wellis 


EDUCATION  (NAIVE) 


Frusfcal-Wa  Uis 


Test 

Statistic 


14.200  ** 


Critical 

Level 

Correlation 
Ccef f lcient 


4.18 


-.280  ** 


**  =  Significant  at  statea  level  of  significance 


experienced  users.  These  experienced  users  were  stripped 
cut  cf  the  sample  and  the  Kruslral-Vall is  test  applied  tr 
only  those  that  were  naive  to  voice  technology.  Using  the 
sarre  hypotheses,  the  obtained  test  statistic  does  net  allrv 
for  the  rejection  of  the  null.  This,  and  th.p  recomputed 
correlation  coefficient  corroborate  the  theory  of 
confounding  and  the  earlier  conclusion  is  now  amended  to 
state  that  level  of  education  will  not  affect  recognition 
accuracy.  Mean  error  rates  for  ail  education  levels  are 
shown  graphically  In  Figure  iy.  Error  rates  for  both,  total 
sample  population  and  naive  users  only,  are  included. 
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(7.09) 
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\ 
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High  1-4  College  Grad  Grad 

Schccl  College  Grad  Student  Degree 


Mgvre  19  Mean  Error  Rate  vs.  Education 


8.  ^£^c^e£cniicricB_Cl^ 

A  variety  of  socio-econorri  c  classes  were  presented 
to  the  participants  for  selection  with  one  of  the  following 
five  chosen  by  each  subject: 


Upper  lower  class 


Lower  rriddle  class 


Middle  class 


Upper  riddle  class 
Lower  upper  class 


The  analysis  of  total  error  rates  for  these  five  groups 
(Table  XXI)  yielded  a  test  statistic  that  would  not  allow 
for  the  rejection  of  the  null  hypothesis,  and  it  Tay  be 


class  will  cot  affect 


concluded  that  socio-economic 
recognition  accuracy,  The  negative  correlation  Indicates 
that  Individuals  of  a  lower  socio-economic  class  tend  to 
acquire  higher  error  rates  although  the  coefficient  is  not 
significant  at  the  0.05  level  (critical  level:  0.158). 


TAEL!  XXI 

AFFECT  OF  SOCIC-ECCNCMI C  CLASS 


:  i  SCCIO-ECONOriC  CLASS  ! 

1  .  -  .  -  _  1 

Type  of  Test 

1 

1 

KrusKal-Vallis 

1 

Alpha 

1 

i 

e.es 

1 

1 

Test  Statistic 

1 

t 

1 .95 

1 

1 

Critical  Level 

1 

( 

.83 

1 

Correlation 

Coefficient 

I 

1 

1 

-e.152 

1 

1 

1 

4  — - 

**  =  Significant 

at 

stated  level  of  significance 

l 

i 

- + 

y.  Cental 

Subjects  were  queried  as  to  their  history  of  dental 
care,  in  particular,  oral  surgery  and/or  orthodontal 
correction.  Two  groups  resulted  upon  vhcse  lata  a  l“ann- 
Whitney  test  was  performed  to  determine  if  any  difference 
existed  between  them.  The  nuil  hypothesis  cannot  he 
rejected  due  to  the  computed  test  statistic  (Table  XXII). 
Critical  regions  for  the  test  statistic  included  values 
treater  than  714.8b  and  less  than  835.21. 
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TABLE  XXII 


AFFECT  OF  PAST  AND/OB  PRESENT  DENTAI  CARE 


!  !  DENTAL  CARE  ! 

1  ^  1 

1 

1 

Type  of  Test 

!  tfann-Whitney 

1 

I 

I 

1 

Alpha 

!  0.05 

1 

1 

1 

1 

Test  Statistic 

!  638.50 

1 

1 

1 

1 

Critical  Level 

!  .3643 

1 

1 

1 

1 

+ - 

**  =  Significant 

at  stated  level  of  significance 

1 

1 

E.  PHIS IOLOGI CAI  CHARACTERISTICS 
1 .  Hypotheses 


The  following  hypotheses  pertaining  to  various 
physiological  characteristics  of  voice  recognition  equipment 
users  were  tested. 


a .  H0  : 

The 

user 

0 

s 

age 

will  not 

affect 

hi 

s  /her 

rec 

cgnlt 

ior 

accuracy . 

H,  : 

Age 

will 

affect 

The  total 

error  ra 

te 

i/I 

O 

u  se 

rs  of 

voi 

ce 

recognition 

equipment 

• 

b .  Hp  s 

The 

height 

and 

weight  of 

an  ird 

1  V 

1  (1 V  c  1 

using  voice 

technology  w 

ill  not 

a 

ffect 

overall  recognition  accuracy. 

H, :  Recognition  accuracy  will  be  affected  by  an 
individual's  weight. 


c.  fl0:  The  vital  capacity  ana  rate  of  air  flew  of 
a  user  will  not  affect  his/her  recognition 
accuracy . 
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H :  Recognition  accuracy  will  be  affected  by  a 
*  person's  vital  capacity  and  rate  of  air 
f  low . 


d.  Hp :  The  overall  physical  condition  of  the  user 
will  not  affect  his/her  recognition 
accuracy. 

H | s  Recognition  accuracy  will  affected  by  one's 
physical  condition. 


E#:  Formal  speech  and/cr  voice  training  will 
net  affect  recognition  accuracy. 

H,:  A  user's  recognition  accuracy  will  be 

affected  by  any  formal  speech  or  voice 
training/therapy. 

2 .  ^Age^ 

The  subjects  ranged  In  age  frer  20  to  47  and  were 
divided  into  five  groups  for  purposes  cf  the  analysis. 
These  groups  and  their  mean  error  rates  are: 


a . 

£0 

to 

£4 

u.ee%) 

b . 

25 

to 

£6 

(7.03?) 

c . 

27 

to 

31 

(7.15%) 

d . 

32 

t  c 

35 

(5.73%) 

e . 

36  + 

(6.10%) 

These  five  groups  v»ere  tested  to  detect  for  differences 
among  their  means.  The  obtained  results  (Table  XXIII)  show 
that  the  null  hypothesis,  stated  above,  cannot  be  rejected 
and  that  the  two  variables,  age  and  total  error  rate,  are 
mutually  independent. 
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TABLE  XXIII 


AFFECT  ON  RECOGNITION  ACCURACY  DUE  TO  AGE 


!  !  AGS  ! 

1  J.  1 

1 

Type  of  Test 

1 

» 

Krusxai-Wa  ills 

1 

i 

1 

1 

Alpha 

1 

i 

e.c5 

1 

* 

l 

1 

1 

Test  Statistic 

1 

1 

2.26 

1 

1 

1 

1 

Critical  level 

1 

1 

>  .50 

1 

i 

1 

1 

Correlation 

1 

1 

1 

1 

1 

f 

Coefficient 

l 

-0.05 

1 

1 

i 

1 

+  — 

**  -  Significant 

at 

stated  level  of  significance 

1 

1 

2.  Height  and  Weight 


Su 

ejee 

ts  r 

ar.ged  In  h 

eight  fre 

rr  60 

tc  77  in 

ch 

es . 

Four 

group 

s  we 

re  generated  for  analysi 

s  and  a 

re  listed 

te 

low 

with 

their 

res 

pect 

Ive  treat  error  rate. 

a . 

60 

to 

64  inches 

(5.46?) 

fc . 

65 

to 

59  inches 

f 6.67?) 

c . 

70 

t  0 

72  Inches 

(5.29 t) 

d  . 

73 

to 

77  inches 

(7.14?) 

The 

result 

5  Of 

the 

ana  lysis. 

as  summer 

ized  i 

n  Table 

XXIV, 

1 nd lea  te 

that 

th 

e  null  hyp 

othesis  ca 

nnot  be 

rejected . 

The 

siral 

1  posi 

ti  ve 

cor 

relation  c 

oef f icient 

is  not 

signif lea 

nt 

at 

the 

.05 

leve 

1  a 

nd  thus 

the  variat 

les  in 

question  may 

te 

cons 

Idered 

to 

te  i 

edependent 

* 

lie 


Weights  of  the  subjects  ranged  frorr  110  to  240 
pounds.  Examination  for  sorre  natural  'break'  points  in  this 
range  resulted  in  the  creation  of  the  following  five  groups 
and  their  corresponding  treat  error  rates. 


a . 

110  to 

125 

pcvnds 

(6.4C%) 

b. 

126  to 

145 

pounds 

(6.65%) 

c . 

146  tc 

175 

pc  und  s 

(5.13%) 

d . 

176  to 

199 

pounds 

(7.16%) 

e. 

2ee+ 

pounds 

(5.88%) 

The  null  hypothesi 

s  cannct  be 

rejected,  with 

coefficient 

indicating  independence  be 

variables . 


the  correlation 
ween  the  twc 


TABLE  XXIV 

AFFECT  OF  HEIGHT  AND  WEIGHT  ON  RECOGNITION  ACCURACY 


l 

1 

HEIGHT 

1 

1 

WEIGHT 

Type  of  Test  ! 

Krusiral-Walli  s 

l 

1 

Kru  saa  1-Wa  Hi  s 

Alpha  ! 

.05 

1 

! 

.05 

Test  ! 

Statistic  ! 

i  .ye 

1 

l 

1 

1 .95 

Critical  | 

Level  ! 

>  .50 

1 

\ 

1 

i 

.75 

Correlation  ! 
Coefficient  ! 

.121 

1 

t 

1 

1 

.064 

**  =  Significant  at  stated 

level 

of  significance 
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The  similarity  in  test  statistics  and  correlation 
coefficients  of  height  and  weight  rray  he  explained  ty 
observing  the  correlation  between  height  and  weight  Itself. 
A  Pearson  product  moment  correlation  of  .821  suggests  a 
strong  positive  association  between  the  two  variables  and 
thus  serves  to  confirm  the  similar  results  of  the  analysis. 

4.  Vital  Capacity  and  Rate  of  Air  Flow 

The  vital  capacity  of  participating  subjects  ranged 
from  191?  to  5725  cubic  centimeters.  The  following  four 
groups  were  created: 

a.  1917  to  2850  cubic  centimeters 

t.  2851  tc  376?  cubic  centimeters 

c.  3y25  to  4450  cubic  centimeters 

d.  4658  to  5725  cubic  centimeters 

Analysis  for  differences  between  the  means  of  the  various 
groups  generated  the  test  statistic  (Table  XIV)  that 
resulted  in  the  rejection  of  the  null  hypothesis.  A 
correlation  tetweeu  increased  vital  capacity  and  low  error 
rates  was  found  tc  be  significant  using  a  cne-tall  test  for 
negative  correlation  (critical  level:  .045). 

The  rate  of  airflow  characteristic  had  a  range  of 
212  to  731  liters  per  minute.  This  range  was  divided  by 
four  and  the  following  groups  were  used  for  the  analysis. 
The  four  included : 


lie 


■"VP i  '  w 


4 


ri  * 


it  MAUihay ^ 


a.  212  10  331  liters/min 
t.  332  tc  46e  liters/min 
c.  461  to  5S9  liters/min 
a.  60e+  liters/min 


TABLE  XXV 

AFFECT  CF  VITAL  CAPACITY  ANE  RATE  CF  AIR  FLOW 


i  1 

*  1 

VITAL  CAPACITY 

1 

1 

RATE  CF  AIR  FLOW 

1 

IType  of  Test  i 

Kruskai-Waliis 

1 

1 

Kruskal-Vali is 

1 

1 

!  Aipba  | 

.05 

1 

I 

.05 

1 

Test  ! 

i  Statistic  ! 

8.58  ** 

1 

1 

t 

1 

6.38 

1 

1 

1 

1 

!  Critical  j 

i  Level  ! 

.  0375 

1 

1 

1 

1 

.095 

1 

l 

1 

1 

!  Correlation  | 

!  Coefficient  ' 

-.26?  ** 

1 

1 

1 

-.318  ** 

1 

\ 

1 

1 

I  i 

**  =  Significant  at  stated  level  of  significance 


—  —  —  —  —  —  —  —  —  — —  —  —4. 


The  test 

statistic  dees  not  allow 

for  the  reject 

ion  of  the 

null , 

tut  a  statistically 

significant 

correlation 

coefficient  provides  an  indication  that  as  rate  cf  air  flew 
increases,  error  rates  will  decrease.  Figures  20  ana  21 
depict  mean  error  rates  for  affects  due  tc  vital  capacity 
ana  rate  of  airflow.  Figures  22  ana  23  provide  the  scatter 
plots  upon  which  the  correlation  coefficients  were 
determined . 
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Figure  20.  Meer.  Errrr  Rate  vs.  Vital  Capacity 
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Figure  21.  Mean  Error  Rate  vs.  Rate  cf  Air  Flow 
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Figure  22.  Scatter  Plot  for  Vital  Capacity 
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Figure  23.  Scatter  Plot  for  Sate  of  Air  Plow 
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The  dilemma  of  a  non-significant  Krusfcal-Viallls  lest 
and  a  significant  correlation  coefficient  can  only  be 
explained  by  the  subjective  division  of  the  range  of  flow 
rates  Into  the  groups  used  for  the  analysis.  Biased 
grouping  could  provide  a  matrix  that  would  yield  a 
significant  test  statistic  tc  show  a  difference  tetween 
means  but  in  the  final  analysis,  credibility  for  this 
characteristic  as  a  determinant  in  personnel  selection  would 
be  lost. 

t.  Physical  Condition 

Pour  groups  resulted  from  the  subjects'  self- 
appraisal  of  their  general  physical  condition  ar.d  include 
categories  of  falr/'poor,  average,  good  and  outstanding 
physical  condition.  Their  tctal  error  rates  were  examined 
to  determine  if  a  difference  tetween  the  groups  existed. 
The  results  presented  in  Table  XXVI  dr  net  allow  us  tc 
reject  the  null  hypothesis.  Additionally,  a  negligible 
correlation  coefficient  presumes  the  two  variables  tc  re 
independent  of  one  another. 

Although  a  subjective  response  was  the  determinant 
for  this  characteristic,  seven  subjects  who  had  cnids, 
trained  the  recognizer.  Their  condition  was  such,  that  a 
distinct  nasality  was  present  while  they  spclce.  A  rann- 
Whltney  test  was  performed  to  determine  If  a  difference 
between  the  healthy  and  'cold'  groups  existed.  The  test 
statistic  of  Table  XXVI  further  verifies  our  previous 
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conclusion;  the  null  cannot  te  rejected.  The  critical 
regions  for  the  Mann-Whl tney  test  correspond  to  values 
greater  than  893.6  and  less  than  771.4 

Finally,  the  analysis  for  affect  dup  tc  formal 
speech  therapy  or  voice  training  resulted  in  a  test 
statistic  that  would  not  allow  for  the  rejection  of  the  null 
hypothesis,  that  speech  therapy  or  voice  training  will  ncT 
affect  a  user's  recognition  accuracy.  Critical  regions 
corresponded  tc  values  greater  than  835  and  less  than  695. 


F.  PSTCHOLOC-ICAI  CHARACTIBISTICS 


a.  Ha:  Anxiety  *111  not  affect  the  recognition 
accuracy  of  a  user. 

H, :  Anxiety  will  affect  the  total  error  rate  of 
a  user. 


b.  Ka :  The  cccperat iveness  cf  a  speaker  will  not 
affect  his/her  total  error  rate. 

Hj  :  Speaker  cooperativeness  will  affect 

recognition  accuracy. 


c.  H0:  The  occurrence  of  recognition  errors  will 

not  affect  overall  recognition  accuracy. 

R, :  A  speaker's  overall  error  rate  will  he 
affected  by  the  psychological  influence  cf 
iris-  and  ncn-recognitions  . 

d.  H0:  A  speaker's  beliefs  in  voice  technology  as 

a  time  saving  job  aid  will  net  affect 
recognition  accuracy. 

H, :  The  attitude  a  person  possesses  toward  the 
influence  of  voice  on  a  corrputer  operator's 
Job  and  their  willingness  to  use  voice 
because  of  this  influence  will  affect 
recognition  accuracy. 


e.  H0:  The  attitude  a  speaker  has  about  computers 
and  information  processing  will  have  no 
psychological  affect  on  recognition 
accuracy. 

H(  :  A  speaker's  psychological  attitude 

concerning  automation  and  data  processing 
will  affect  recognition  accuracy. 

2.  Psychological  Anxiety 

The  results  o?  the  State-Trait  Anxiety  Inventory  are 
depicted  graphically  in  Figures  24  to  26.  Figures  24  and  25 
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show  soire  indication  that  individuals  with  a  lower  state 
anxiety  acquired  fewer  errors.  The  relationship  between 
error  rate  and  trait  anxiety,  shown  in  Figure  2£,  depicts  a 
more  randomized  occurrence  of  error  rates.  Correlation 
analysis  substantiates  this  in  that  state  anxiety  during 
week  #1  is  statistically  significant  with  week  #2  showing 
some  positive  correlation  but  net  significant  at  the  .C5 
level.  There  is  no  significant  positive  correlation  between 
trait  anxiety  and  error  rates. 

The  obtained  STAI  scores  yielded  a  normal 
distribution  and  equal  sample  sizes  of  high  and  low  anxiety 
users.  With  the  basic  assumptions  for  use  of  a  parametric 
test  met,  a  two  sample  t-test  was  used  to  detect  differences 
between  groups.  Additionally,  the  ren-parametr ic  Mann- 
Whitney  test  was  applied  for  purposes  of  further 
verification,  however  it  does  net  possess  the  power  of  its 
parametric  counterpart.  Results  of  the  analysis  are 
included  in  Table  XXVII. 

In  all  cases  using  non-parametric  analysis  the  null 
hypothesis  cannot  be  rejected,  although  the  critical  level 
shows  the  test  statistic  to  be  Just  within  the  acceptance 
region.  The  dichotomy  in  the  trait  anxiety  analysis  is 
interesting;  the  more  powerful  parametric  test  allows  the 
rejection  of  the  null  hypothesis  whereas  the  opposite  exists 
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■Figure  24.  Peer  Error  Rate  vs.  State  Anxiety  (rfeeic  *1) 
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Figure  26.  Mean  Error  Hate  vs  .  Trait  Anxiety 


using  the  Mann-Whitney .  In  both  instances  though,  the  test 
statistic  lies  extremely  close  to  that  point  separating  the 
acceptance  am  critical  regions. 

The  afreet  cue  to  anxiety  may  be  considered  as 
inconclusive  because  of  the  resultant  statistical  analysis. 
Although  showing  significant  correlation  In  'tfeelc  #1.  any 
anxiety  in  VeeK  #2  -nay  have  been  overcome  or  masked  by 
familiarity  and  experience  with  equipment  and  procedures. 
3y  Wee*  #3  ana  the  administration  of  the  Trait  Inventory, 
subjects  were  thoroughly  versed  la  the  experimental 
procedure.  The  inconsistent  results  nevertheless,  leave 
reason  to  believe  that  anxiety  has  an  affect  on  speech  and 
hence  recognition  accuracy,  but  the  degree  to  which  It  does 
remains  a  clouaed  issue. 
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AFFECT  ON  Ri COGNITION  ACCURACY  DUE  TO  ANXIETY 
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3.  Speaker  Coop eratlveness 


Subjects  evaluated  their  degree  of  cooperativeness 
on  an  Interval  scale  with  subsequent  creation  of  the 
following  groups. 

a.  Less  than  cooperative  speakers 

b.  Moderately  cooperative  speakers 

c.  Very  cooperative  speakers 

d.  Extremely  cooperative  speakers  (subjects  who 
marked  the  'anchor  point'  of  the  scale) 

The  results  cf  the  analysis  are  presented  in  Table  XXVIII. 
with  mean  error  rates  graphically  represented  in  Figure  27. 
The  null  hypothesis  Is  rejected  due  to  a  test  statistic 
greater  than  the  Chi-square  value  of  7.615.  Multiple 
comparisons  among  the  groups  reflect  an  existent  difference 
between  the  'less  than  cooperative'  and  'eitrerrely 
cooperative'  speakers  only.  Despite  indication  of  sore 
correlation  between  high  coopera tiveress  and  low  error  rate, 
the  computed  coefficient  is  not  significant  at  a  .(*5  level 
(Critical  Level:  0.0y5). 

These  results  led  to  a  further  analysis  from  a 
perspective  of  speaker  participation.  That  is,  did  the 
subject  like  participating  in  this  type  of  eiperimentatl on 
and  If  sc,  could  It  be  correlated  to  total  error  rate? 
Their  subjective  responses  resulted  in  the  creation  of  three 
generic  groups  as  fellows: 
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Figure  27.  Mean  Error  Bate  vs.  Speaker  Cooperativeness 

TABLE  XXVIII 

AFFECT  CF  SPEAKER  COOPERATION  AMD  PARTICIPATION 

+ - + - - - 4 - - - -  - 


COCPERATI VENESS  !  PARTICIPATION 


Type  of  Test  ! 

Krusjial-Wal  11  s 

1 

1 

Kruskal-Walli s 

Alpha  ! 

.05 

1 

1 

.05 

Test  1 

1 

1 

Statistic  ! 

16.82  ** 

1 

i 

4.76 

Critical  ! 

1 

1 

Level  ! 

<  .005 

\ 

.095 

Correlation  ! 

1 

» 

Coefficient  i 

-.226 

1 

1 

+.27E  ** 

**  =  Significant  at  stated 

level 

of  significance 

13e 


a.  These  who  don't  care 

b.  Persons  who  like  to  participate 

c.  Persons  who  strongly  liice  to  participate 

In  this  instance  the  attainment  cf  a  positive  correlation 
indicating  that  those  whe  lilted  to  participate  acquire 
higher  error  rates  is  counter-intuitive.  The  cull  cannot  be 
rejected  based  on  the  computed  test  statistic  given  in  Table 
XXVIII.  A  correlation  or  .636  between  subject  responses  to 
cocperat iveness  and  participation  is  cot  as  large  as  was 
expected  and  as  such  could,  in  part,  have  led  to  the 
divergent  results.  Whether  these  results  are  due  tc  willing 
participants  trying  too  hard  tc  perform  well  and  thus, 
having  greater  than  usual  mis-  or  non-recognitions  is 
unclear. 

4.  Recognition  Errors 

Subjects  respenaed  tc  two  Questions,  one  pertaining 
to  their  feelings  at  the  time  of  a  mi s-recogni t ion  and  the 
ether  pertaining  tc  their  feelings  ever  a  rcn-reccgniticn 
(beep).  Their  responses  to  these  two  questions  were 
averaged  tc  represent  how  they  felt  teward  the  occurrence  of 
an  error  and  this  led  to  the  creetior  of  two  distinct 
groups?  those  who  don't  like  an  error  to  occur  and  those  who 
feel  they  are  not  disturbed  cr  tothered  by  an  error.  The 
results  of  the  analysis  are  summarized  in  Table  XXIX. 
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TABLE  XXIX 


AFFECT  OF  RECOGNITION  ERRORS 


!  !  ERRORS  ! 

1  1 

1 

1 

Type  of  Test 

1 

t 

^ann-Wbitney 

1 

1 

t 

1 

Alpha 

l 

1 

0.05 

1 

1 

1 

1 

Test  Statistic 

1 

1 

eu.se 

1 

1 

1 

1 

Critical  Level 

1 

1 

.0897 

1 

1 

1 

1 

1 

Correlation 

Coefficient 

1 

1 

1 

1 

-0.225 

1 

1 

1 

1 

1 

f 

**  =  Significant 

at  stated 

level  of  significance 

1 

1 

The 

null  hypothesis  ce 

mot  be 

rejected  ana  although 

the 

negative  correlation  coefficient  indicates  that  those  who 
dislike  errors  tend  to  have  higher  error  rates,  it  is  ret 
significant  at  an  alpha  of  .05  (Critical  Level:  .07). 

5.  Attitudes  Toward  the  Use  cf  Voice 


Cuestions  4,  6  ana  £  of  User  Cuest ionne 1  re  #2  were 
used  to  treasure  the  speaker's  attitudes  toward  voice 
technology.  The  results  (Table  XXX)  Indicate  a 
statistically  significant  correlation  between  high  error 
rates  and  a  favoratle  attitude  toward  voice  recognition  as  a 
treans  of  saving  titre  and  reoucir.g  the  burden  on  a  computer 
operator.  Scatter  plots  of  responses  tc  these  questions  and 
associated  error  rates  are  depicted  in  Figures  P8-30. 
Multiple  comparisons  between  the  groups  shewed  differences 
between  those  whe  would  always  use  voice  and  those  who  world 
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Figure  2e.  Scaxxer  Plot:  l*ea a  Srror  Sate  vs.  Question  “£ 


seldom  use  it  despite  its  pronounced  advantages,  and  between 
those  who  felt  that  the  advantages  of  voice  will  give  the 
Keyboard  operator  other  jobs  and  those  who  disagree  with 
such  an  attitude.  Therefore,  the  null  hypothesis  cannot  be 
rejected  in  terms  cf  a  speaker's  attitude  concerning  the 
influence  on  a  data  processor's  job  due  to  voice 
recognition.  Cn  the  other  hana,  a  speaker's  willingness  to 
use  voice  recognition  because  of  his/her  beliefs  in  its 
requisite  advantages  will  affect  errcr  rates. 

As  was  noted  earlier,  the  presence  of  a  positive 
correlation  appears  to  be  contrary  to  popular  belief.  One 
would  Imagine  that  a  user  who  believes  voice  recognition  can 
rake  the  job  of  a  computer  operator  easier  (Question  #4), 
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would  tend  toward  better  recognition  accuracy.  yuesticns 
sii  and  eight  were  asked  for  the  purpose  of  determining  if 
a  user's  error  rate  might  be  influenced  by  the  subconscious 
thought  of  encumterlng  additional  duties  because  of  the 
efficiency  and  effectiveness  of  voice  input.  But,  despite 
the  possibility  cf  additional  tasks,  potential  users  still 
would  prefer  voice  to  manual  entry.  However,  the  presence 
of  a  significant  positive  correlation  may  only  be  attributed 
to  the  uniqueness  of  the  situation?  ie.  as  in  speaker 
participation  subjects  who  professed  a  strong  desire  to  use 
voice  regardless  of  consequences  may  have  tried  too  hard  for 
high  accuracy  and  as  a  result  have  failed  to  speak  in  a 
'natural '  manner  . 

6.  Attitude  Toward  Computers  end  Information  Processing 
In  response  tc  two  sets  cf  auestiors,  subjects 
provided  their  attitudes  surrounding  the  necessity  of 
computers  in  todays  society  ana  how  voice  technology  would 
aid  Information  processing  or  data  Input.  Attitudes  towards 
computers  fell  into  three  general  categories  . 

a.  Persons  who  feel  computers  are  unnecessary, 
t.  Persons  that  feel  computers  are  necessary  In 
society,  cut  are  not  a  panacea  for  all  problems, 
c.  Those  who  feel  that  computers  are  an  absolute 
necessity. 

Attitudes  toward  voice  recognition  and  information 

resulted  in  four  categories. 
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processing 


a.  These  believing  that  vcice  would  taKe  mere  tiire 
for  information  or  data  processing. 

b.  Those  with  no  opinion. 

c.  Those  who  feel  voice  will  save  sore  tire 

d.  Those  who  feel  voice  can  save  immeasurable  time 
compared  to  conventional  methods  of  data  entry 
and  information  processing. 

Results  of  the  analysis  are  summarized  in  Table  XXII.  Based 
on  these  results,  the  null  hypothesis  cannot  be  rejected  and 
thus,  it  may  te  concluded  that  the  opinion  or  attitude  a 
person  possesses  towards  computers,  and  their  feelings 
pertaining  to  voice  as  a  tire  saving  advantage  will,  not 
affect  their  recognition  accuracy. 


TABLE  XXXI 

AFFECT  DUE  TO  ATTITUDES  TC'i/APP  COMPUTERS 
AND  DATA  PROCESSING 


1  < 

COMPUTERS 

J 

BATA  PROCESSING 

\ 

t 

!Type  of  Test  ! 

Xrustcal-Wallis 

1 

1 

Xruska 1-Wa 11 Is 

1 

1 

i 

!  Alpha  ! 

.BE 

1 

f 

.C5 

1 

1 

1  Test  ! 

J  Statistic  ! 

.78 

1 

t 

1 

i 

3.38 

i 

1 

t 

1 

» 

(  Critical  ! 

!  Level  ! 

>  .8 

1 

J 

1 

l 

.15 

( 

1 

1 

1 

!  Correlation  ! 

!  Coefficient  ! 

1  * 

.111 

1 

1 

1 

-.164 

1 

r 

I 

( 

1  **  =  Significant  at  stated 

level 

of  significance 

* 

i 

—  + 

G.  VOCABULARY  ERRORS 

As  a  result  of  using  different  numbers  of  syllables  In 
the  vocabulary,  it  was  also  possible  to  get  an  indication  of 
how  well  utterances  with  different  numbers  of  syllables  were 
recognized.  Originally  done  it  a  longitudinal  study  iRef. 
24:  pp.  9-10]  it  is  analyzed  within  the  context  of  this 
document  as  further  verification  of  those  earlier  results. 
This  is  shewn  by  weeds  in  figure  31  and  ever  all  conditions 
in  Figure  22.  Both  figures  illustrate  a  generally  declining 
error  rate  as  a  function  cf  the  number  cf  syllables  in  the 
utterance.  Although  the  current  exrerlmentat ion  yielded  an 
approximately  1.5  percent  rise  in  error  rate  from  three  tc 
four  syllables,  it  is  not  a  iargt;  deviation  from  the  earlier 
study  which  indicated  little  change  in  errer  rates  between 
three  or  four  syllables  woras. 

In  terms  of  overall  effectiveness,  a  practical 
application  wcvld  dictate  the  least  amount  cf  recognition 
errors.  Therefore,  an  error  rate  of  5.91%  stiii  remains  tve 
to  three  percent  better  than  utterances  with  a  smaller 
syllabic  content.  Despite  the  higher  rate  for  four  syllable 
compared  tc  five  syllable  words,  the  difference  is  still 
less  than  that  of  one  to  four  or  two  to  four  syllables.  The 
variety  cf  vocabulary  Items  used  in  this  experiment  further 
confirms  the  argument  that  trough  a  careful  and  Judicious 
selection  cf  vocabulary  items,  large  vocabulary  difficulties 
and  associated  bigh  error  rates  may  be  reduced. 
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VI .  CONCLUSIONS 

Following  the  lengthy  elaboration  of  results  in  the 
previous  section  it  would  be  helpful  to  recapitulate.  In  a 
brief  suirmary  forrr,  the  responses  of  the  different  variables 
tested.  Variables  resulting  in  a  statistically  significant 
test  statistic  included: 

--  Method  of  training 

—  Experience  of  the  user 

—  Previous  computer  experience 

--  Level  cf  education  (all  subjects) 

—  Vital  capacity 

—  Speaker  cccperativeness 

The  following  variables  produced  ~  significant 
correlation  between  itself  and  recognition  error  rate. 

—  Frevious  computer  experience 

—  Tire  of  the  week 

—  Experience  cf  the  user 

—  level  of  education  (all  subjects) 

—  Speaker  participation 

—  Vital  capacity 

—  Bate  of  air  flow 

—  State  anxiety  (first  week  only) 

—  User  attitudes  pertaining  to  voice 
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The  following  variables  resulted  in  either  a  not 
significant  test  statistic  and/cr  ccrrelatlcn  coefficient. 

—  Jot  function 

—  Branch  of  service 

—  Job  satisfaction 

—  Service  satisfaction 

—  Foreign  language  competency 

—  Time  of  day 

—  Time  of  week  (test  statistic  only) 

—  Ease  cf  use  of  voice  equipment 

—  Level  of  education  (naive  users) 

—  Socio-economic  class 

—  Cental  care 

—  Race 

--  Marital  status  and  family  size 

—  Religious  preference 

—  Accent 

—  Place  cf  cirtn/gecgraphic  criein 

—  Age 

—  Height  and  weight 

—  Rate  of  airflow  (test  statistic) 

—  Physical  conditlonlng/speech  training 

—  Anxiety:  State  and  Trait 

—  Speaker  ccoperativeness  (correlation) 

—  Speaker  participation  (test  statistic) 


i 

l 
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—  Affect  of  recognition  errors 

—  Attitudes  toward  computers/data  processing 

The  wide  range  in  error  rates,  .50  tc  15.7  percent,  for 
the  individual  subjects  (See  Appendix  J  for  a  complete 
summary)  Indicates  an  obvious  variability  between  subjects. 
Within  the  context  of  the  main  experiment  and  the  associated 
ANOVA,  the  three  variables  of  job  function,  training  method, 
and  experience  (trials),  are  independent  events  and  are 
protected  from  confounding  due  to  the  experimental  design. 
The  selection  of  a  level  of  significance  equal  to  .05  is 
merely  to  shew  a  possible  existence  cf  some  effect,  rot  to 
demonstrate  a  rigorous  test  of  a  stated  hypothesis.  As  the 
analysis  progresses  tc  the  extraction  of  numerous  ether 
human  factors,  these  protections  ana  thP  accompanying  power 
of  a  parametric  test  are  reduced.  In  some  instances  an 
awareness  of  a  possible  dependence  between  conditions  is 
necessary  prior  to  reaching  an  ultimate  conclusion.  For 
example,  were  those  subsets  of  a  category  achieving 
statistical  significance  also  trained  with  supervision 
and/cr  experienced  users  and  if  so,  how  many  were  in  that 
particular  subset7 

The  results  presented  herein  suggest  that  speaker 
variability  would  not  affect  recognition  accuracy  to  such  an 
extent  as  to  preclude  its  use  among  only  specially  selected 
users.  Scr  implementation  in  military  applications,  this 
proves  to  be  especially  satisfying  since  it  would  negate  the 
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services  from,  the  necessity  of  classifying  personnel  into 
particular  military  occupational  specialties  or 
subspecialties  for  the  express  purpose  of  operating  voice 
equipment.  It  is  apparent  from  the  experimental icr ,  and  the 
diversity  of  skills  and  experience  contained  within  the 
sample  population,  that  practically  anyone  may  be  a 
potential  candidate  to  operate  voice  recognition  equipment. 

The  phrase  'prac t ica i iy  anyone'  should  be  qualified 
here.  Interspeaker  variability  had  a  significant  impact  l~ 
the  case  of  one  subject,  who  possessed  a  severe  speech 
impairment;  stuttering.  It  became  obvious  in  the  ea-ly 
stages  of  Training  that  be  would  te  unable  to  finish  tie 
training  phase.  In  fact,  after  3C  minutes,  only  11 
utterances  had  been  satisfactorily  placed  into  rr-mory. 
Although  the  individual  was  eliminated  as  an  experimental 
subject,  his  difficulty  demonstrates  that  although  most 
anyone  can  use  this  type  of  technology,  there  will  always 
exist  those,  albeit  few  ir.  number,  who  for  one  exception  or 
another  are  unable  to  attain  a  suitable  level  of  recognition 
accuracy . 

The  current  experimentation  has  clearly  shewn  that, 
experience  and  method  of  training  voice  equipment  can 
provide  excellent  recognition  accuracy  rates.  Of  course, 
what  determines  an  'excellent'  rate  is  purely  subjective  and 
determinate  upon  the  application  in  which  emplaced.  What 
makes  this  observation  readily  appealing  is  that  both 


characteristics  are  controlled  by  the  human.  They  are  r.ot 
factors  that  one  is  born  with  or  has  inherited.  Hather, 
with  closely  supervised  training  procedures,  by  an 
experienced  operator,  a  'naive'  user  can  quickly  attain 
recognition  rates  greater  then  95  percent  and  with 
repetitive  experience  increase  this  accuracy  until  errors 
are  reduced  to  less  than  two  percent.  It  must  be  reiterated 
that  in  the  present  experiment,  subjects  were  not  allowed  tc 
retrain  the  recognizer  during  the  three  weeks  of  recognition 
testing.  In  actuality,  the  speaker  would  retrain  an 
utterance  rather  than  tc  continue  incurring  mis-  or  r.on- 
reccgniticn  errors. 

To  a  lesser  degree,  speaker  cc cperat iveress  and  amount 
of  previous  computer  experience  are  definitely  factors  to  be 
considered.  The  latter  characteristic  influences  the 
personnel  selection  process  while  speaker  c oopera t lver.ess , 
like  training  anc  experience,  can  be  influenced  by  tbe  human 
element.  Certainly,  recause  of  data  processing  experience, 
such  individuals  can  readily  identify  with  the  advantages  of 
speech  input  and  thereby  become  a  mcie  or  highly  cooperative 
speaker.  Thus  combined,  these  two  factors  strongly  support 
the  potential  for  achieving  high  recognition  accuracy. 

The  presence  of  occasional  positive  correlation 
coefficients,  that  were  statistically  significant,  are 
difficult  to  explain  or  resolve  conclusively.  Such 
instances  as  level  of  participation,  desire  to  use  voice, 
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and  attitudes  pertaining  to  voice,  provided  misleading 


results.  It  was  surmised  that  speakers  who  are  willing 
participants  and  find  voice  to  be  a  technology  that  they 
would  liKely  use,  would  achieve  low  error  rates.  The 
observation  to  the  contrary,  supposes  that  many  of  those 
speakers  tried  too  hard  for  perfect  recognition  accuracy, 
and  as  a  result,  were  Less  apt  to  speak  naturally.  In 
effect,  they  were  trying  to  outsmart  the  machine. 

Thus,  in  an  operational  environment  it  becomes  incumtent 
upon  both  the  speaker  and  the  supervisor  to  fully  embrace 
the  concept  of  voice  technology  for  use  in  a  practical 
application.  In  demonstrations  at  the  N’aval  Pos tgradua te 
School  it  is  frequently  noted  that  observers  are  genuinely 
impressed  with  the  capabilities  of  voice  input  of  data  until 
that  one  error,  soretirres  after  more  that  20S  successfully 
recognized  uttterances,  occurs  and  they  sit  back  and  remark 
that  perhaps  "additional  research  is  needed  trier  tc  placing 
it  into  operational  use".  It  is  obvious  that  voice 
technology  is  acceptable  for  use  in  a  military  command 
center  and  m;ust  be  fully  supported  by  the  Commander  and  his 
Staff.  If  it  Is,  error  rates  can  be  minimized  by  human 
controls  such  as  training  and  experience.  In  conclusion, 
consistency  may  best  describe  the  key  to  speaker 
variability.  Attitudes,  training,  and  experience  together, 
produce  consistency  in  speech  and  consistency  generates  a 
continued  high  recognition  accuracy  rate. 
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APPENDIX  A 


USEP  QUESTIONNAIRE  #1 


NAP'S: _  SUBJECT#: _ 

INSTRUCTIONS : 

The  purpose  of  this  questionaire  is  to  obtain  information 
frcp-  you  regarding  physical  characteristics ,  personal 
background,  and  opinions  pertaining  to  voice  recognition 
equipment  and  its  use.  Tour  answers  will  assist  in 
determining  whether  personal  and/or  physiological  traits 
contribute  to  effective  utilization  of  voice  recognition 
equ  ipment . 

The  questions  include  multiple  choice,  YES/NC,  ratine  scale 
and  short  answer  (one  or  two  words  ONLY!)  types. 
Appropriate  guidance  accompanies  each  question  or  block  of 
questions . 

Your  name  is  NCT  required  but  is  requested  in  orler  to  ease 
the  necessary  correlation  cf  y cur  replies  with  your  results 
in  the  experitrenta  t  ion .  If  you  desire  anonymity,  please 
respond  with  your  subject  number  only.  Please  respond 
truthfully.  Check  your  questionaire  after  completion  to 
insure  you've  completed  ail  the  questions. 

Thank-you  for  your  assistance  in  this  experiment. 

14? 
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Ia  questions  1  -  22,  provide  either  a  one  or  two  word 
response,  or  place  an  X'  oy  the  appropriate  answer. 

1.  What  is  your  age?  _ 

2.  What  is  your  height  (in  inches)?  _ 

3.  What  is  ycur  weight?  _ _ _ 

4.  What  is  ycur  race? 

_  White  (Caucasian) 

Yellow  (As lan /Mongoloid) 

Black  (Negroid/Af rican ) 

Rea  (American  Indian) 

5.  what  is  ycur  nationality? 

Native  Citizen  of  the  United  States 
Naturalized  Citizen  of  the  United  States 
Alien 

5.  What  is  ycur  religious  preference?  _ 

(See  Attached  Sheet) 

?.  What  is  ycur  ethnic  background? 

Puerto  Rican 
Yiiiplno 
Mexican 
_ _  Cuban 

Latin  American  (persons  frcr  Central  or  S.  Arrerica) 

Other  Hispanic  Descent  (Extraction  not  delineated 
as  Mexican,  Puerto  Rican,  Cuban  or  Latin  American) 

14£ 


>  /.  Vi  V*-* 


ISKirro 


Aleut 

Indian 

Melanes ian 

Chinese 

Japanese 

Korean 

Polynesian 

V  ietnarrese 

Other  Asian  Descent  (Extraction  not  delineated  a 
Chinese,  Japanese,  Korean,  Indian,  Filipino,  or 

V  letnairese ) 

None  cf  the  Aocve 

Other  (Please  specify _ ' 


£.  Dc  ycu  have  an  accent? 

TIS  (what  Kind? _ _ ) 

NO 


y.  What  is  your  Marital  Status 
Married 
Divorced 
Single 

Other  (separated,  widowed) 

10.  How  many  children  do  you  have? 

0 

1 
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11.  Do  yew  wear  glasses? 

YES 

NC 

12.  Have  you  ever  had  orthodontist  care  S.  /or  vear/vorn 
traces? 

YES 

NO 

13.  What  Is  your  level  of  education? 

Non  High  School  Graduate 
High  School  Graduate 
Associate  Degree 

1  year  cf  cclieee 

2  years  of  college 

3  years  of  college 

_ 4  years  of  ccilere  {no  decree!' 

College  graduate  (BA/ES> 

Graduate  wcric  of  more  than  1  year  (no  degree) 
Masters  Tegree  received 
Doctorate  Degree  received 

14.  Whet  state  were  you  born  in? 

15.  During  ages  1-18,  in  what  state  did  you  principally 
reside? 
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16.  What  has  teen  your  state  of  residence  for  the  majority 
of  the  last  three  years? 

1?.  Do  ycv.  speaK  any  foreign  ianguage(s)? 

YES  (which  one ( s )  _ ] 

NO 

18.  What  is  your  brarch  of  service? 

Navy 

Army 

Narine  Corps 
Air  Force 
Cther  (civilian) 

19.  How  many  years  have  you  been  ir  the  service? 

20.  Have  ycu  ever  teen  overseas  ?  :r  irerp  than  13 
consecutive  trorths?  (not  including  leave  or  vacation) 

YES  (go  tc  question  #21) 

NC  (go  to  question  #22) 

21.  How  rrany  months  were  you  overseas? 

In  wnat  country? 

22.  What  do  ycu  consider  to  te  your  socioeconomic  class? 

Lower  Class 
Upper  Lower  Class 
lower  Middle  Class 
Middle  Class 
Upper  Middle  Class 
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Lover  Upper  Class 
Upper  Class 


In  questions  22  -  4 V.  place  an  'X'  on  a  point  on  the  scale 
that  nest  Indicates  or  describes  your  feelings.  The  '7'  nay 
be  placed  anywhere  along  the  scale. 


23.  How  do 
have? 

1 

you  feel  atout 

1 

the  Jot  cr 

l 

position  you 

""  ^  nJ  1 

currently 

1 

i 

LIFE  VERY 
MUCH 

1 

LIFE 

NEUTRAI 

iisim 

1 

DISLIKE 
VERY  MUCH 

24.  Hew  -rucii  satisfaction 
of  the  Armed  Services? 

(  1 

do  ycu  derive  frcT  bein^ 

i  .  .  .  -  i . 

a  member 

i 

1 

SATISFIED 

1 

SATISFIES 

i 

BCHDEHII ME 

i 

UNSATISIIEI 

VERY 

UNSATISFIED 

25 .  Computers  are  necessary  in  today 

l  l  i 

's  society. 

,  .  1 

-  —  —  1 

1 

DECIDEDLY 

AGREE 

1 

SI  IGF".  IT 

AGREE 

* 

NC  OPINION 
ION  'T  KNCV 

1 

SIIGHTIT 
IIS  AGREE 

DECIDEDIY 

DISAGREE 

25.  How  would  voice  recognition  make 
Jet? 

1  1  .  1 

a  computer  o 

_ _ _ _  i  __ 

peraxor 's 

_  i 

t  - 

much 

EASIER 

1 

SCMEkHAT 

EASIER 

1 

NC  CPIMICN 

t 

MORE 

DIFFICULT 

MUCH  MORE 
DIFFICULT 
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2?.  How  would  voice  recognition  equipment 

information  processing  or  data  input? 


I 

I 


I 

1 


SAVE  A  LOT 
Qi  TIM 


SAVE  SOME 
TIME 


NO  OPINION 
DON'T  KNOW 


TAKES  MOBE 
TIME 


TAKES  A  ICT 
MOBE  TIME 


2t.  If  voice  recognition  can  save  tine,  it  would  allow  a 
Keyboard  operator  to  dc  other  joes. 


!  ___  _  _ 

I 

DECIDEDLY 

AGREE 


SLIGHTLY 

AGREE 


NO  OPINION 
DON'T  KNOW 


SHORTLY 

DISAGREE 


_ i 

i 

DECIDEDLY 

DISAGREE 


29.  Describe  the  use  of  voice  recognition  equipment. 


_ _ i _ _  i _ _ _ _ _ i  ___ _ _ _ __  i 

i  i  i  i 

VERY  EASY  EASY  TC  NO  OPINION  DIFFICULT  VEHY 

10  US K  USE  TC  USE  DIFFICULT 

TO  USE 


3c.  Ahat  ao  ycu  thiuk  of  voice  recognition  equiprent  for 
use  in  Military  Ccrrana  Centers? 


VERY  SOMEWHAT  NO  OPINION  SCME1* HAT  VERY 

PRACTICAL  PRACTICAL  DON'T  KNOW  IMPRACTICAL  IMPRACTICAL 


31.  How  much  previous  computer  experience  have  you  had? 


ALOT  OF  CONSIDERABLE  SOME  VERY  LITTLE  NO 

EXPERIENCE  EXPERIENCE  EXPERIENCE  EXPERIENCE  EXPERIENCE 
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Z<i.  What  is  ycur  previous  experience  with  voice  recc^m ti or 
equipment? 


I  _ _ __  __  ‘  _ _ i  _ _ _  _  I 


1 

VERY  MUCH 

MUCH 

SOME 

1 

A  IITTLE 

t 

NONE 

23.  How  would  additional  experience  with  voice  recognition 
equipment  affect  recognition  accuracy? 

1  I  1  .  _  1  .  1 

1 

MUCH 

IMPROVEMENT 

1 

SOME 

IMPROVEMENT 

1 

NO  OPINION 

1 

A  LITTLE 
IMPROVEMENT 

NC 

IMPPCVEMEN 

34.  Row  ao 

you  feel 

_ ! _ 

when 

a  misrecognitior.  occurs? 

1  1 

1 

i 

STRONGLY 

LUI 

ms 

1  ' 

MUIR  Al 

1 

DISLIKE 

1 

STRCNGIY 

DISLIKE 

23.  How  10 

1 

you  feel 

when 

c  non-recognition  ('beep') 

•  i 

occurs? 

1 

1 

STRONGLY 

LIKE 

IUS 

t 

NEUTRAL 

1 

DISLIKE 

1 

STRONGLY 

DISIIKF 

2c.  How  ao 

1  „ 

you  feel 

_ 1  _  __. 

when 

a  recognition 

.  _  \  _  _  _ 

occurs? 

_  _  1 

1 

I  —  i  —  —  —  —  —  —  j  ........  (  .............  ( 


STRONGLY 

LIKE 


mi 


NEUTRAL 


DISLIKE 


STRCNGIY 

DISLIKE 


67.  Describe  ycur  participation  in  this  experiment. 


i _ _ _ i _ _ _ i _ _ _ _ _ i _ i 

i  -  - 1”  i  - - 1 - 1 

EXTREMELY  MODERATELY  COOPERATIVE  SCMEWhAT  VERY 

cooperative  cooperative  uncooperative  uncoop¬ 

erative 


Uc .  How  would  you  aescrice  your  participating  in  this  type 
of  experimentation? 


i  _  _ _ _ _ i _ _ i _ i _ i 

I  I  II! 


STRONGLY  LIKE 

LISE 


NEUTRAL  DISLIKE  STRONGLY 

DISLIKE 


c9.  Vnhat  is  your  current  physical  condition? 


OUTSTANDING  GOOD  AVERAGE  PAIR  POOR 


*2.  If  voice  recognition  dees  save  time  and  allows  YOU  tc 
be  assigned  other  tastes,  tow  often  wcula  YOU  went  to  use  it? 


I  *  I  t  I 

I"  “  ~  I  “  ~~  i  “  “  “  *  ~~~  |  ” — —  —  —  , 


ALWAYS  P SEQUENT IY  NOW  AND  THEN  SELDOM  NEVER 


its 


-U'i 
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user  questionnaire  #2 

NAPE  : _  SUBJECT#: _ 

INSTRUCTIONS : 

The  purpose  of  this  questionaire  is  to  obtain  information 
from  you  regarding  physical  characteristics,  personal 
background ,  and  opiaions  per ta ini ng  to  voice  recognition 
equipment  and  its  use.  Your  answers  will  assist  in 
determining  whether  personal  and/cr  physiological  traits 
contribute  tc  effective  utilization  cf  voice  recognition 
eqv ipmen  t . 

The  questions  include  multiple  choice,  YSS/NG,  rating  scale 
and  short  answer  'one  or  two  words  ONLY!)  types. 
Appropriate  guidance  accompanies  each  question  or  block  of 
quss  t ices  . 

Ycur  name  is  NCT  required  Dut  is  requested  in  order  to  ease 
tne  necessary  correlation  cf  ycur  replies  with  ycur  results 
in  the  experimentation .  If  you  desire  anonymity,  please 
respond  with  your  subject  number  only.  Please  respond 
truthfully.  CbecK  your  questionaire  after  completion  to 
insure  you've  completed  ail  the  questions. 

Tnank-you  for  your  assistance  in  this  experiment. 
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In  questions  1  -  '£ ,  provide  either  a  one  or  two  worn 

response,  cr  place  an  'X '  Dy  the  appropriate  answer. 


1.  Have  you  ever  tad  ore  or  Tore  of  the  following  speech 
impediments  aua/or  Impairments? 

Articulation  (difficulty  in  pronouncing  vowels 
ana/or  consonants) 

Voice  ^irregularities  in  the  larynx) 

Cleft  lip  era/or  lip  palate 
Ceretral  palsy 
Stuttering 
Bearing  impairments 
_  Aphasia 

Congenital  speech  defects  (due  to  Dir tt/pregnency) 

Retardation 

None  of  the  above 

2.  Have  you  ever  received  speech  therapy  from  either  a 
subsidized  (free)  clinic,  private  speech  therapist,  cr 
through  the  puciic  school  system? 

YIS 

NC 

6.  have  you  ever  received  voice  training  or  taicen  singing 
lessons? 

_ YIS  (Hew  many  years? _ ) 


In  questions  4-15  place  an  'X  on  a  point  on  the  scale 
that  best  indicates  or  describes  your  feelings.  The  'X'  iray 
be  placed  anywhere  along  the  scale. 


4.  hew  would  voice  recognition  me Jte  a  corrputer  operator's 
jot? 


MUCH 

EASIER 

j  —  ' 

SCMFWHAT 

EASIER 

NC  CPINICN 

i  — 

MORI 

DIFFICULT 

"  '  "  1 

MUCH  MCRE 
DIFFICULT 

5 .  How 

information 

I  __  _  _ 

would  voice 

processing  or 

recognition 
data  input? 

. .  .  ,  1 

equipment 

I 

affect 

1 

1 

o A V I  A  LOT 
Cl  TIME 

1 

SAVE  SC l":2 
TIME 

NO  OPINION 
DON'T  KNCW 

i 

TAKES  MORE 
TIME 

1 

TAKES  A  LCT 
MORE  TIME 

c.  If  voice  recognition  can  save  time 
Keyboard  operator  to  do  other  jobs. 

1  .  ...  .  .  t  *  .  . 

,  it  would 

allow  a 

1 

l 

DECIDEDLY 

A  (IF  EE 

1 

SIIGHTIT 

AGREE 

i 

NC  OPINION 
DON'T  KNCW 

1  "" 

SLIGHTLY 

DISAGREE 

1 

DECIDEDLY 

DISAGREE 

?.  Describe  the  use  of  voice  recognition  equipment. 

i _ _ _ i _ _ _ _  i _ _ _ i _ _ _ _  i 


1 

VERY  EASY 

EASY  TC 

NC  OPINION 

1 

DIFFICULT 

.  i 

VERY 

TO  USE 

USE 

TO  USE 

DIFFICULT 
TO  USE 

ise 


B.  If  voice  recognition  doe*  save  Hire  and  allows  YOU  to 
oe  assigned  otner  tasits,  how  often  would  YOU  want  tc  use  it? 


i _ _ _ i _ _ _ _ i _ _ _ i _ _ _ i 

i  - i  — - 1 - - 1 - 1 


ALWAYS  FREQUENTLY  NOW  AN L  THEN  SELDOM  NEVER 


s.  How  would  additional  eiperience  with  voice  recognition 
equipment  affect  recognition  accuracy? 


\  r  '  ' 

MUCH 

~  1  “ 

SOME 

'  “  '  1  "  '  '  ' 

NO  CPINICN 

i 

A  LITTLE 

1 

NC 

IMPROVEMENT 

IMPROVEMENT 

IMPROVEMENT 

IMPROVEMENT 

Id.  How  do  you  feel  when  a  mi sreccgnition  occurs? 


1 

STRONGLY 

~  1 

IIXE 

l 

N EUTRAL 

DISLIKE 

STRONGLY 

LIKE 

DISLIKE 

11.  How  io  you  feel  vnen  a  non-recognition  ('Deep')  occurs? 


i _ _ _ i _ _ _ i _ _ _ i _ i 

i  i  i  i  i 


STRONGLY 

LIAS 


USE 


NEUTRAL 


DISLIKE 


STRONGLY 

DISLIKE 


12.  How  ao  you  feel  when  a  recognition  occurs? 


I _ _ _ I _ I _ I _ __  ! 


J 

STRONGLY 

LIKE 

1  ' 

NEUTRAL 

■  ”  " 1  i  "  " " 

BISLIKZ 

■  1  ' r  i 

STRONGLY 

LIKE 

DISLIKE 

13.  Descrioe  your  participation  in  this  experiment. 


EXTREMELY  MODERATELY  COOPERATIVE  SOMEWHAT  VERY 

COOPERATIVE  CCCPEEATIVE  UNCOOPERATIVE  UNCCCP- 

FRATIVE 


14.  How  wouia  you  descrioe  your  participating  in  this  type 
or  experimentation? 


i  _ _ ___ _ __ i _ _ _ _  <  _ _ _  > 

i  i  i  '  i 


STRONGLY  I IKE  NEUTRAL  DISLIKE  STRONGLY 

LIKE  DISLIKE 


It.  What  ao  ycu  thins  of  voice  recognition  equipment  for 
use  in  Military  command  Centers? 


VERY  SOMEWHAT  NO  OPINION  SOMEWHAT  VERY 

PRACTICAL  PRACTICAL  DON 'T  KNOW  IMPRACTICAL  IMPRACTICAL 
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SELE -EVALUATION  QUESTIONNAIRE 

NAME _ _ _ DATE _ SUBJECT# _ 

DIRECTIONS:  A  number  of  statements  which  people  have  used 
to  describe  themselves  are  given  below.  Read  each  statement 
and  then  circle  the  appropriate  numter  to  the  right  of  the 
statement  that  indicates  how  you  GENEPALLY  feel.  There  are 
no  right  or  wrong  answers.  Please  dc  not  spend  too  much 
time  on  any  one  statement,  but  give  the  answer  which  seems 
to  describe  how  you  GENERALLY  feel. 

1  =  ALMOST  NEVER 

2  =  SOMETIMES 

3  =  OiTEN 

4  =  ALMOST  ALWAYS 


1 . 

I  feel  pleasant 

1 

2 

w 

4 

2 . 

I  tire  quickly 

1 

2 

*7 

W 

4 

■X 

W  • 

I  feel  like  crying 

1 

2 

T 

v. 

4 

4  . 

I  wish  I  could  be  as  happy  as 
ethers  seem  tc  be 

1 

2 

V* 

4 

I  am  losicg  out  on  things  oeceuse 

I  can't  make  up  m.y  mind  soon 

1 

2 

3 

4 

enough 
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rr 


t. 

I  reel  rested 

1 

2 

?. 

I  am  "calm,  ccoi,  and  collected" 

1 

C. 

*< 

w 

fa. 

I  feel  that  difficulties  are 
piling  up  sc  that  I  cannct 
overcome  them 

1 

c 

'Z 

Co 

fa. 

I  \>orr y  too  much  over  soretning 
that  really  doesn't  matter 

1 

d 

n 

c. 

10  . 

I  am  happy 

1 

2 

•7 

w 

11  . 

I  am  inclined  to  tatfe  things  hard 

1 

r- 

4L 

12  . 

I  lacK  seif  confidence 

1 

d 

*1 

w 

16. 

I  feel  secure 

1 

2 

Co 

14. 

I  try  to  avoid  facing  a  crisis 
or  difficulty 

1 

2 

c 

15  • 

I  feel  oiue 

1 

2 

rz 

:e. 

I  am  content 

1 

2 

T 

17. 

Some  unimportant  thought  runs 
through  my  mina  and  bothers  tre 

1 

2 

6 

it . 

I  taue  disappointments  sc  Keenly 
that  I  can't  put  them  out  of  my 
mind 

l 

C. 

6 

iy . 

I  am  a  steady  person 

1 

c 

3 

20 . 

I  get  in  a  state  of  tension  or 
turmoil  as  I  thinK  ever  my  recent 
concerns  and  interests 

1 

2 

Co 

1  62 


sccbing  key 

for  the 

A-TRA1T  IVAIUATICN 


APPENDIX  l 

SEIP-EVALUATIGN  GUEST ICNNAIRE 


NAM  DATE 


SUBJECT# 


DIRECTIONS:  A  number  of  statements  which  people  have  used 
to  aescrite  themselves  are  given  below.  Read  each  staterrent 
ena  then  circle  the  appropriate  number  to  the  right  of  the 
statement  that  indicates  bow  you  feel  RIGHT  NOW  —  AT  THIS 
VERY  MOMENT-  There  are  no  right  or  wrong  answers.  Please 


lo  net  spend  too  much  tire  on  any  one  statement,  tut  give 
the  answer  tnat  best  describes  your  PRESENT  feelings. 


1.  I  feel  cair 
i i .  I  reel  secure 


6.  I  ar  tense 


4.  I  ar  regretful 

t.  I  feel  at  ease 

e.  I  feel  upset 

7.  I  am  presently  worrying 
over  possible  misfortunes 


=  NOT  AT  ALL 


=  SOMEWHAT 
=  MODERATELY  SO 
=  VERY  MUCH  SC 


w. 


fi.  I  feel  rested 


1 


2 


7 


4 


y. 

I 

feel 

anxious 

1 

r- 

c. 

*2 

u 

4 

10  . 

I 

feel 

confor table 

1 

2 

V 

4 

11. 

I 

feel 

seif-ccnf ident 

1 

c 

V* 

4 

12. 

I 

feel 

nervous 

1 

2 

T 

w 

4 

13. 

1 

am  jittery 

1 

2 

*2 

4 

14. 

I 

feel 

"nign  strung" 

l 

c 

'Z 

ss 

4 

SCCBING  KEY 
for  the 

A-STATE  EVALUATION 


1. 

r* 

c. . 

3. 

4. 
£}. 
6. 
?. 
e. 

5s. 

la. 

11 . 

12. 

13. 

14. 

15. 

16. 
17  . 
le . 

19. 

20. 


4 

4 

1 

1 

4 

1 

1 

4 

1 

4 

4 

1 

1 

1 

4 

4 

1 

1 

4 

4 


3 

3 

2 

2 

3 

2 

2 

3 

2 


3 

3 

c 

2 

y 
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appendix  e 


UTTERANCE  LIST:  TRAINING  WEEN  - 


WORD# 

UTTERANCE 

000 

THREE 

001 

EUROPE 

00k: 

MOVE  IT  LEET 

00a 

CARRIAGE  RETURN 

004 

LOGOUT 

005 

COMMAND 

00c 

STRAIT  CE  HORMUZ 

007 

TIME 

006 

KOREA 

00b 

ZERO 

010 

CHANGE  DIRECTORY  TO  PCCC 

011 

ALPHA 

012 

POSITIVE 

f>  4 

l/I\. 

IDENTIi ICATICN 

014 

LAUNCH 

016 

RELOCATE 

eic 

DELTA 

0i? 

TASK  FORCE  CCMf  ANDER 

016 

KILO 

01b 

LOGIN  YELLEN 

02e 

ECHC 

021 

NOVEMBER 

022 

TWC 

022 

united  states 

024 

FOUR 

026 

BRAVO 

026 

PLACE  A  CIRCLE  CN  MCSCCW 

027 

ENEMY  DETECTION 

02c 

PROCEED 

02  b 

RCMEC 

020 

FLIGHT  CONTROLLER 

031 

SEVEN 

022 

GROUND  CONTROL  APPROACH 

022 

REPORT 

024 

airfield  name 

036 

lima 

036 

available 

027 

message 

036 

SATELLITE 

03b 

SHOOT 

040 

YANKEE 

041 

AFFIRMATIVE 
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WEEK# 1 


CRT  PROMPT 

THREE 
l DR OPE 

MOVE  IT  LEFT 
CARR  RETURN 
LOGOUT 
COMMAND 
STR  C?  HMRZ 
TIME 
KOREA 
ZERO 

C  DIP.  TO  PK 

ALPHA 

POSITIVE 

IDNTPICATION 

LAUNCH 

RELOCATE 

DELTA 

TSK  PRC  CDR 
KILO 

LOGIN  YZLLEN 
ECHO 

NOVEMBER 

TWO 

UNITED  STS 
F  CUE 
BRAVO 

PL  A  CIR  NOS 

SN  DETECTION 

PROCEED 

ROMEO 

HIT  CTLR 

SEVEN 

GNE  CTL  APPR 
REPORT 
ARID  NAME 
LIMA 

AVAILABLE 

MESSAGE 

SATELLITE 

SHOOT 

YANKEE 

AEEIRMATIVE 


04  2 

CHARLIE 

CHARLIE 

042 

TORPEDO 

TORPEDO 

2944 

JIVE 

FIVE 

045 

OPERATIONS  PLAN 

OPNS  PLAN 

046 

OFFENSE 

OFFENSE 

2947 

IP  IN  DETAIL 

UP  IN  DETAIL 

294b 

NINE 

NINE 

049 

PRCEAE  ILITY  C F  DETECTION 

PRCB  CF  DETN 

050 

NEUTRAL 

NEUTRAL 

051 

JULIETT 

JULIETT 

052 

SPEED 

SPEED 

052 

UNIFORM 

UNIFORM 

05ft 

SENSOR 

SENSOR 

055 

TANGO 

TANGO 

0se 

CLOSE  CUT  CHARLIE 

CDS  OUT  CHRL 

057 

LOAD  THE  GANN 

LD  THE  GANN 

05k 

OSCAR 

OSCAR 

esy 

NORTH  ATLANTIC  MAP 

N  ATL  MAP 

060 

PACIFIC  DATA  BASE 

PAC  DAT  BASE 

061 

HUMAN  FACTORS 

HUM  FACTORS 

062 

FOXTROT 

FOXTROT 

063 

SOVIET 

SOVIET 

064 

DEFENSE 

DEFENSE 

065 

CNE 

CM 

066 

INDIA 

INDIA 

067 

ADVANTAGES 

ADVANTAGES 

066 

GCIF 

GOLF 

06b 

CANCEL 

CANCEL 

070 

ZULU 

DULL 

071 

NEGATIVE 

NEGATIVE 

072 

PLOT  ALL  SUBMARINES 

PIT  ALL  SUBS 

0  72 

XRAY 

XRAY 

074 

REFUEL 

REFUEL 

075 

AUTOMATIC  RECOGNITION 

AUTC  RECOG 

076 

QUEBEC 

QUEBEC 

077 

TRACK  ENEMY 

TRACK  ENEMY 

076 

LEVEL  TWO 

LEVEL  TWO- 

079 

COURSE 

COURSE 

060 

JOINT  TASK  FORCE 

JT  TSK  FRC 

0fcl 

SIX 

SIX 

062 

WHISKEY 

WHISKEY 

063 

ATTACK 

ATTACK 

064 

SIERRA 

SIERRA 

065 

MANEUVER  DELAY 

MNUVR  DELAY 

066 

DISTANCE 

DISTANCE 

067 

EXECUTE 

EXECUTE 

066 

EIGHT 

EIGHT 

069 

V ICTCR 

VICTOR 

090 

MEDITERRANEAN  MAF 

MED  MAP 

091 

SEA  OF  JAPAN 

SEA  OF  JAPN 

092 

POPPA 

POPPA 

093 

FILE  TRANSFER  FROTGCCI 

FI  TNSFB  PRO 

16b 


0y4 

altitule 

HOTEL 

eye 

NUKE  THEM  TILL 

ay? 

ACCAT  TITLE 

0bc 

MIKE 

eyy 

MISSILE 

ALT  nun 
HOTEL 

THEY  GLGlfc  NUKE  EM 

ACCAT  TITLE 

MIKE 

MISSILE 


l£y 


AFPZNDIX  F 


UTTERANCE.  LIST:  WEEK  #2 


WORD# 

UTTERANCE 

£0* 

MISSILE 

00  1 

MIKE 

002 

ACCAT  TITLE 

002 

NUKE  THEM  TILL  TREY  GLOW 

004 

HOTEL 

005 

ALTITUDE 

006 

PILE  TRANSFER  PROTOCOL 

00V 

PCFFA 

006 

SEA  OF  JAPAN 

00b 

MEDITERRANEAN  MAP 

010 

VICTOR 

011 

SIGHT 

012 

EXECUTE 

01c 

DISTANCE 

014 

MANEUVER  DELAY 

01c 

SIERRA 

016 

ATTACK 

017 

WHISKEY 

016 

SIX 

eiy 

JOINT  TASK  ECRC? 

020 

COURSE 

021 

LEVEL  TWO 

022 

TRACK  ENEMY 

022 

gUEBEC 

024 

AUTOMATIC  RECCGNITICN 

025 

REFUEL 

0  26 

XP  AY 

027 

FLOT  ALL  SUBMARINES 

U26 

NEGATIVE 

02b 

ZULU 

030 

CANCEL 

031 

GOLF 

032 

ADVANTAGES 

033 

INDIA 

034 

ONE 

035 

DEFENSE 

036 

SOVIET 

037 

FOXTBCT 

036 

HUMAN  FACTORS 

03b 

PACIFIC  DATA  BASE 

040 

NORTH  ATLANTIC  MAP 

041 

OSCAR 
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042 
843 
044 
04  £ 
046 
04? 
048 
04b 
050 
051 
052 
053 
054 
065 
05c 
05? 
058 
05b 
060 
061 
062 
063 
064 
066 
066 
06? 
066 
06b 
070 
071 
072 
073 
074 
075 
076 
077 
078 
07b 
080 
081 
082 
082 
084 
085 
086 
087 
088 
06b 
090 
0b  1 
092 
093 


LOAD  THE  GANN 

CLOSE  OUT  CHARLIE 

TANGO 

SENSCfi 

UNIIORM 

SPEEE 

JULIETT 

NEUTBAL 

PROBABILITY  Cl  DETECTION 
NINE 

UP  IN  DETAIL 
CEEINSE 

OPERATIONS  FLAN 

EIVE 

TORPEDO 

CHARLIE 

AFFIRMATIVE 

YANKEE 

SHOOT 

SATELLITE 

MESSAGE 

AVAILABLE 

LIMA 

AIRFIELD  NAME 
REPORT 

GROUND  CONTROL  APPROACH 
seven 

PLIGHT  CONTROLLER 

ROMEO 

PROCEED 

ENEMY  DETECTION 

PLACE  A  CIRCLE  ON  MOSCOW 

ERAVC 

FOUR 

UNITED  STATES 
TWO 

NOVEMBER 

ECHO 

LOGIN  YELLEN 
KILO 

TASK  EORCE  COMMANDER 

DELTA 

RELOCATE 

LAUNCH 

IDENTIFICATION 

POSITIVE 

ALFHA 

CHANGE  DIRECTORY  TO  PCOCK 

ZERO 

KOREA 

TIME 

STRAIT  OF  HORMUZ 
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COMMAND 

LOGOUT 

CARRIAGE  RETURN 
MOVE  IT  LEFT 
EUROPE 
THREE 


172 


APPENDIX  G 


UTTERANCE  LIST :  WEEK  #3 


WORD# 

UTTERANCE 

000 

CARRIAGE  RETURN 

001 

STRAIT  CE  HORMUZ 

002 

ZERC 

003 

POSITIVE 

004 

RELOCATE 

005 

KILO 

006 

NOVEMBER 

007 

POUR 

006 

ENEMY  DETECTION 

00b 

PLIGHT  CONTROLLER 

010 

REPORT 

011 

AVAILABLE 

012 

SECCT 

013 

CHARLIE 

014 

OPERATIONS  PLAN 

eit 

NINE 

0ie 

jliLIETT 

017 

SENSOR 

0it 

LOAD  THE  GANN 

01b 

PACIFIC  DATA  BASE 

020 

SOVIET 

021 

INDIA 

022 

CANCEL 

023 

FLCT  All  SUBMARINES 

024 

AUTOMATIC  RECOGNITION 

025 

LEVEL  TWO 

02  b 

SIX 

027 

SIERRA 

02b 

EXECUTE 

02b 

fEDITERRANZAN  MAP 

e30 

HIE  TRANSFER  PROTOCOL 

031 

NUKE  THEM  TILL  THEY  GLOW 

032 

MISSILE 

033 

MOVE  IT  LEFT 

034 

COMMAND 

035 

KOREA 

036 

ALPHA 

037 

LAUNCH 

036 

TASK  FORCE  COMMANDER 

03b 

ECHO 

040 

UNITED  STATES 

041 

PLACE  A  CIRCLE  ON  MOSCOW 
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042 
04  2 
044 
046 
046 
047 
048 
04b 
050 
051 
052 
052 
£54 
055 
05c 
067 
058 
05b 
060 
061 
062 
062 
064 
065 
066 
067 
066 
06b 
070 

071 

072 
072 
074 
0  7t 
076 
077 
076 
07b 
080 
0b  1 
062 
082 
084 
065 
086 
087 
088 
08b 
0b0 
0bl 
0b2 
0b2 


romeo 

GROUND  CONTROL  APPROACH 
LIMA 

SATELLITE 

AFFIRMATIVE 

FIVE 

UP  IN  DETAIL 

NEUTRAL 

UNIFORM 

CLOSE  OUT  CHARLIE 

NORTH  ATLANTIC  MAP 

EOXTBOT 

ONE 

(iGL? 

NEGATIVE 

REiUEL 

TRACK  ENEMY 

JOINT  TASK  FORCE 

ATTACK 

DISTANCE 

VICTOR 

POPPA 

HOTEL 

MIKE 

EUROPE 

LOGOUT 

TIME 

ChANGE  DIRECTORY  TO  FOOCK 

IDENTIFICATION 

t)TTT  t  a 

LOGIN  YELLEN 

THREE 

THO 

BRAVC 

PROCEED 

SEVEN 

AIREIELD  NAME 
MESSAGE 
YANKEE 
TORPEDO 

offense 

PROBABILITY  OE  DETECTION 

SPEED 

TANGO 

OSCAR 

HUMAN  FACTORS 

DEFENSE 

ADVANTAGES 

ZULU 

ARAY 

QUEBEC 

COURSE 


094 

WHISKEY 

095 

MANEUVER  DELAY 

096 

EIGHT 

09? 

SEA  OF  JAPAN 

096 

ALTITUDE 

099 

ACCAT  TITLE 
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APPENDIX  B 


DATA  COLLECT ION  A  CRh 


NAM: 


SIX:  V  F  SUBJECT  #: 


RANK:  _  LAY/TIM:  (TRIALS  1-2] 

_ (TRIALS  3-4] 

_ (TRIALS  6-6] 

feEIK# :  12  2 


MICROPHONE:  _ EXPERIENCE! 

TRAINING:  SUPERVISED 


NON-EXPERIENCEE 
NCN-SUPERV  ISH 


UTTERANCE  I  TPIAI  # 

!  l  !  2  I  2  ! 

THREE  1  !  j  ! 

EUROPE  !  !  !  I 

MOVE  IT  LEE T  i  I  !  j 

CARRIAGE  RiTUaN  Jit- 

LOGOUT  111! 

COMMAND  iili 

STRAIT  OF  HORMUZ  J  |  |  j 

TIM  !  (  !  ! 

KOREA  111! 

zef:  iiii 

CHG  DIR  TO  PCCCK  \  \  \  \ 


i 

« 

I 
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I 


ALPHA 

POSITIVE 

IDENTIFICATION 

LAUNCH 

RELOCATE 

DELTA 

TASK  ECHCE  CMEB 
KILO 

LOGIN  YELLEN 
ECHO 

NCVEMEER 

TTfcC 

UNITED  STATES 

FCUR 

ERAVO 

PL  CIRCLE  ON  MOSCOW 
ENEMY  DETECTION 
PROCEED 
ROMEO 

FLIGHT  CONTROLLER 
SEVEN 

GRND  CTRL  APPROACH 
REPORT 

AIRFIELD  NAME 
LIMA 


177 


I 


I 


I 


I 


I 


j AVAILABLE 
1 1  MESSAGE 
.SATELLITE 
!  SHOOT 
i  YANKEE 
JAJPIBMATIVE 
.'CHARLIE 
!  TORPEDO 
i  FIVE 

.OPERATIONS  PLaN 
IOEFENSE 
i UP  IN  DETAIL 
.NINE 

I PRCE  OF  DETECTION 
i  NEUTRAL 

i 

iJULIZTT 
i  SPEED 
1  UNI  FORM 
'.S  IN  SOS 
i  TANGO 

.CLOSE  OUT  CHARLIE 
ILCaD  THE  GANN 
1  OSCAR 

iNCRTH  ATLANTIC  PAF 


PACIEIO  DATA  BASE 


iHUMAN  i ACTORS 
i  FOXTROT 
!  SOVIET 
!  DEFENSE 


1  ONE 
!  INDIA 
iADVANTAGES 
iGCLF 
CANCEL 
ZULU 

NEGATIVE 

FLCT  ALL  SUBMARINES 

EBAY 

RZEUEL 

AUTO  RECCGNIT ICN 
QUEBEC 
TRACK  ENEMY 
LEVEL  TXC 
COURSE 

JOINT  TASK  FORCE 
SIX 

WHISKEY 
ATTACK 
i SIERRA 

i  __  ^ 


MANEUVER  DELAY 


I 


i 


i  DISTANCE 
{EXECUTE 
lEIGhT 
[VICTOR 

ii^EDITEBRANEAN  PAP 
j  3 EA  Ci  JAFAN 
iPGPPA 

[FILE  TNSFB  PRCTCCCI 

[ALTITUDE 

.acTii 

iNUKI  TILL  THET  GLOW 
!  AC  CAT  TITLE 
if*  I  EE 

Tissue 


LATA  REDUCTION 

i#  NCiN-RICCGNITICNS 
!#  f'IS-RECCGN ITICNS 
i#  TOTAL  EBBGRS 
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Af PENLIX  I 


MASTER  LIST  Of  UTTERANCES 


ONI  SELLABLE  UTTERANCES  (15) 

ONE 

TWO 

THREE 

EOUR 

El  VE 

SIX 

EIGHT 

NINE 

GOLE 

MIKE 

LAUNCH 

TIME 

SHOOT 

SPEEI 

COURSE 


TWO  SYLLABIE  UTTERANCES  (35) 

EUROPE 

LOGOUT 

ZERO 

SEVEN 

ALPHA 

BRAVO 

CHARLIE 

LELTA 

ECHO 

EOXTRCT 

HOTEL 

KILO 

LIMA 

OSCAR 

POPPA 

CUEBEC 

TANGO 

VICTOR 

WHISKEY 

XRAY 

YANKEE 
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ZULU 

COMMANE 

REPORT 

OFFENSE 

DEFENSE 

ATTACK 

PROCEED 

CANCEL 

MESSAGE 

DISTANCE 

NEUTRAL 

MISSILE 

SENSOR 

REEUEL 


THREE  SYLLABLE  UTTERANCES  (20) 

MOVE  IT  LEET 
SOVIET 

JOINT  TASK  FORCE 

NOVEMEER 

JULIETT 

ROMEO 

SIERRA 

INDIA 

UNIFORM 

KOREA 

NEGATIVE 

POSITIVE 

EXECUTE 

AIREIELD  NAME 

ALTITUDE 

RELOCATE 

LOAD  THE  GANN 

LEVEL  T'VnO 

SATELLITE 

TORPEDO 


FOUR  SYLLABLE  UTTERANCES  llv) 

CARRIAGE  RETURN 
LCG1N  YELLEN 
STRAIT  OF  HORMUZ 
UNITED  STATES 
FLIGHT  CONTROLLER 
AVAILABLE 
AFFIRMATIVE 
UP  IN  DETAIL 


1&2 


CLOSE  OUT  CHARLE 
HUMAN  FACTORS 
ADVANTAGES 
TRACK  ENEMY 
SEA  OE  JAPAN 
ACCAT  TITLE 


UTTERANCES  GREATER  THAN  OR  EQUAL  TC  £  SYLLABLES  (16) 

MANEUVER  DELAY 
CHANGE  DIRECTORY  TO  FCCCK 
IDENUEICATICN 
TASK  FORCE  CCMf- ANDER 
PLACE  A  CIRCLE  ON  MOSCOW 
GROUND  CONTRCI  APPROACH 
ENEMY  DETECTION 
NORTH  ATLANTIC  MAP 
MEDITERRANEAN  MAP 
PROBABILITY  Of  DETECTION 
OPERATIONS  PLAN 
PACIFIC  DATA  b AS  I 
PLOT  ALL  SUBMARINES 
AUTOMATIC  RECOGNITION 
PILE  TRANSFER  PROTOCOL 
NIKE  THEM  TILL  THEY  GLOW 


APPENDIX  J 


INDIVIDUAL  SUBJECT  RECOGNITION  RATES 


The  following  are  mean  error  rates  for  each  subject 
participating  in  the  experiment.  The  data  is 
partitioned  to  mirror  the  groups  established  In  the 
overall  experimental  design  and  are  expressed  in  percent 
error . 

GROUP  I  GROUP  II 


4.fcb 
7  . 17 
7  .3b 

4.3b 


13.11 
y  .22 
6  .fey 
e.3y 


a  .22 
C  .44 
6.23 
fc  .06 

1  .cl 

2  .ey 
2.61 


c  r  ^ 
0  .  C.C. 

6  .6b 
6.72 
b  .33 
4.06 
2.00 
1.67 
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GKCUt  III 


GROUP  IV 


4.436 

10.11 

£.11 

15.1? 

.5e 

4.69 

B  .94 

15.72 

fa  .4:6 

B  .06 

y  .ee 

£  •  w  W 

u  7  £ 

8.44 

C  tkk 

6.26 

4 .5e 

2.39 

2  .fa4 

7.11 

£  .61 

4.33 
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